当赫斯特还不够的时候：通过分布模型改善从语料库的高呼气检测

论文标题

当赫斯特还不够的时候：通过分布模型改善从语料库的高呼气检测

When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models

论文作者

Yu, Changlong, Han, Jialong, Wang, Peifeng, Song, Yangqiu, Zhang, Hongming, Ng, Wilfred, Shi, Shuming

论文摘要

我们解决了超Nymy检测，即在大型文本语料库的帮助下，单词（x，y）之间是否存在IS-A关系。大多数常规的此任务方法已被分类为基于模式的或分布。最近的研究表明，如果提取和喂食大规模的赫斯特对，则基于模式的研究是优越的，而看不见的（x，y）对的稀疏性会缓解。但是，在某些特定的稀疏案例中，它们变得无效，其中X或Y不参与任何模式。本文首次量化了这些特定情况的不可忽略的存在。我们还证明，在这种情况下，分布方法是弥补基于模式的方法的理想选择。我们设计了一个互补的框架，在该框架下，基于模式的和分布模型在每个人都喜欢的情况下无缝协作。在几个基准数据集上，我们的框架可以取得了竞争性的改进，案例研究表明了其更好的解释性。

We address hypernymy detection, i.e., whether an is-a relationship exists between words (x, y), with the help of large textual corpora. Most conventional approaches to this task have been categorized to be either pattern-based or distributional. Recent studies suggest that pattern-based ones are superior, if large-scale Hearst pairs are extracted and fed, with the sparsity of unseen (x, y) pairs relieved. However, they become invalid in some specific sparsity cases, where x or y is not involved in any pattern. For the first time, this paper quantifies the non-negligible existence of those specific cases. We also demonstrate that distributional methods are ideal to make up for pattern-based ones in such cases. We devise a complementary framework, under which a pattern-based and a distributional model collaborate seamlessly in cases which they each prefer. On several benchmark datasets, our framework achieves competitive improvements and the case study shows its better interpretability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题