论文标题
英语WordNet的同义信息
Homonymy Information for English WordNet
论文作者
论文摘要
WordNet的一个广泛认可的缺点是,它缺乏与系统相关的单词含义(polysemy)与偶然(同义词)的含义之间的区别。以前的几项作品试图通过使用计算方法推断此信息来填补这一空白。我们重新审视这项任务,并利用语言建模的最新进展来综合普林斯顿Wordnet的同义注释。先前的方法使用聚类方法处理问题;相比之下,我们的方法通过将WordNet与牛津英语词典链接到包含我们需要的信息来起作用。为了执行此路线,我们根据变压器模型产生的嵌入空间中的近距离定义对定义进行配对。尽管这种方法很简单,但我们最好的模型在我们注释的评估集中达到了.97的F1。我们作品的结果是普林斯顿Wordnet的高质量同义注释层,我们会发布。
A widely acknowledged shortcoming of WordNet is that it lacks a distinction between word meanings which are systematically related (polysemy), and those which are coincidental (homonymy). Several previous works have attempted to fill this gap, by inferring this information using computational methods. We revisit this task, and exploit recent advances in language modelling to synthesise homonymy annotation for Princeton WordNet. Previous approaches treat the problem using clustering methods; by contrast, our method works by linking WordNet to the Oxford English Dictionary, which contains the information we need. To perform this alignment, we pair definitions based on their proximity in an embedding space produced by a Transformer model. Despite the simplicity of this approach, our best model attains an F1 of .97 on an evaluation set that we annotate. The outcome of our work is a high-quality homonymy annotation layer for Princeton WordNet, which we release.