关键字歧义的语义相关性：利用不同的嵌入

论文标题

关键字歧义的语义相关性：利用不同的嵌入

Semantic Relatedness for Keyword Disambiguation: Exploiting Different Embeddings

论文作者

Buey, María G., Bobed, Carlos, Gracia, Jorge, Mena, Eduardo

论文摘要

理解单词的含义对于许多涉及人机相互作用的任务至关重要。在自然语言处理（NLP）领域中的词义歧义（WSD）方面的研究已经解决了这一点。最近，WSD和许多其他NLP任务利用了基于嵌入的单词，句子和文档的表示。但是，在WSD方面，大多数嵌入式模型都没有歧义，因为它们没有捕获单词的不同含义。即使他们这样做，也必须在训练时间预先知道一个单词的可能含义（sense contectore）的列表，以便将其包括在嵌入式空间中。不幸的是，在某些情况下，这种意义的库存未提前（例如，在运行时选择的本体学），或者随时间演变，其状态与培训时的状态不同。这阻碍了WSD的嵌入式模型的使用。此外，在可用的语言信息非常稀缺的情况下，例如基于关键字的查询情况，传统的WSD技术表现不佳。在本文中，我们提出了一种关键字歧义的方法，该方法基于在培训时间不知道的外部库存（本体学）提供的单词和感官之间的语义相关性。在以前的作品的基础上，我们提出了使用单词嵌入的语义相关性度量，并探索不同的歧义算法也可以利用单词和句子表示。实验结果表明，这种方法在适用于WSD的情况下可以实现与最新技术的现状相当的结果，而无需对特定领域进行培训。

Understanding the meaning of words is crucial for many tasks that involve human-machine interaction. This has been tackled by research in Word Sense Disambiguation (WSD) in the Natural Language Processing (NLP) field. Recently, WSD and many other NLP tasks have taken advantage of embeddings-based representation of words, sentences, and documents. However, when it comes to WSD, most embeddings models suffer from ambiguity as they do not capture the different possible meanings of the words. Even when they do, the list of possible meanings for a word (sense inventory) has to be known in advance at training time to be included in the embeddings space. Unfortunately, there are situations in which such a sense inventory is not known in advance (e.g., an ontology selected at run-time), or it evolves with time and its status diverges from the one at training time. This hampers the use of embeddings models for WSD. Furthermore, traditional WSD techniques do not perform well in situations in which the available linguistic information is very scarce, such as the case of keyword-based queries. In this paper, we propose an approach to keyword disambiguation which grounds on a semantic relatedness between words and senses provided by an external inventory (ontology) that is not known at training time. Building on previous works, we present a semantic relatedness measure that uses word embeddings, and explore different disambiguation algorithms to also exploit both word and sentence representations. Experimental results show that this approach achieves results comparable with the state of the art when applied for WSD, without training for a particular domain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题