论文标题
当赫斯特还不够的时候:通过分布模型改善从语料库的高呼气检测
When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models
论文作者
论文摘要
我们解决了超Nymy检测,即在大型文本语料库的帮助下,单词(x,y)之间是否存在IS-A关系。大多数常规的此任务方法已被分类为基于模式的或分布。最近的研究表明,如果提取和喂食大规模的赫斯特对,则基于模式的研究是优越的,而看不见的(x,y)对的稀疏性会缓解。但是,在某些特定的稀疏案例中,它们变得无效,其中X或Y不参与任何模式。本文首次量化了这些特定情况的不可忽略的存在。我们还证明,在这种情况下,分布方法是弥补基于模式的方法的理想选择。我们设计了一个互补的框架,在该框架下,基于模式的和分布模型在每个人都喜欢的情况下无缝协作。在几个基准数据集上,我们的框架可以取得了竞争性的改进,案例研究表明了其更好的解释性。
We address hypernymy detection, i.e., whether an is-a relationship exists between words (x, y), with the help of large textual corpora. Most conventional approaches to this task have been categorized to be either pattern-based or distributional. Recent studies suggest that pattern-based ones are superior, if large-scale Hearst pairs are extracted and fed, with the sparsity of unseen (x, y) pairs relieved. However, they become invalid in some specific sparsity cases, where x or y is not involved in any pattern. For the first time, this paper quantifies the non-negligible existence of those specific cases. We also demonstrate that distributional methods are ideal to make up for pattern-based ones in such cases. We devise a complementary framework, under which a pattern-based and a distributional model collaborate seamlessly in cases which they each prefer. On several benchmark datasets, our framework achieves competitive improvements and the case study shows its better interpretability.