论文标题

当赫斯特还不够的时候:通过分布模型改善从语料库的高呼气检测

When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models

论文作者

Yu, Changlong, Han, Jialong, Wang, Peifeng, Song, Yangqiu, Zhang, Hongming, Ng, Wilfred, Shi, Shuming

论文摘要

我们解决了超Nymy检测,即在大型文本语料库的帮助下,单词(x,y)之间是否存在IS-A关系。大多数常规的此任务方法已被分类为基于模式的或分布。最近的研究表明,如果提取和喂食大规模的赫斯特对,则基于模式的研究是优越的,而看不见的(x,y)对的稀疏性会缓解。但是,在某些特定的稀疏案例中,它们变得无效,其中X或Y不参与任何模式。本文首次量化了这些特定情况的不可忽略的存在。我们还证明,在这种情况下,分布方法是弥补基于模式的方法的理想选择。我们设计了一个互补的框架,在该框架下,基于模式的和分布模型在每个人都喜欢的情况下无缝协作。在几个基准数据集上,我们的框架可以取得了竞争性的改进,案例研究表明了其更好的解释性。

We address hypernymy detection, i.e., whether an is-a relationship exists between words (x, y), with the help of large textual corpora. Most conventional approaches to this task have been categorized to be either pattern-based or distributional. Recent studies suggest that pattern-based ones are superior, if large-scale Hearst pairs are extracted and fed, with the sparsity of unseen (x, y) pairs relieved. However, they become invalid in some specific sparsity cases, where x or y is not involved in any pattern. For the first time, this paper quantifies the non-negligible existence of those specific cases. We also demonstrate that distributional methods are ideal to make up for pattern-based ones in such cases. We devise a complementary framework, under which a pattern-based and a distributional model collaborate seamlessly in cases which they each prefer. On several benchmark datasets, our framework achieves competitive improvements and the case study shows its better interpretability.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源