论文标题
单词嵌入的拓扑:奇异性反映多义
Topology of Word Embeddings: Singularities Reflect Polysemy
论文作者
论文摘要
该歧管假设表明,在其环境矢量空间内,单词向量生存在子曼群上。我们认为,我们应该更准确地期望它们生活在捏合的歧管上:通过识别其某些点来获得的多种歧管的单数商。确定的单数点对应于多义单词,即具有多种含义的单词。我们的观点表明,可以根据其社区拓扑来区分单体和多义单词。我们提供了两种经验证据来支持这一观点:(1)我们基于持续的同源性介绍了多义拓扑度量,这与单词的实际含义数量很好地相关。 (2)我们为Semeval-2010任务提供了一种简单的,拓扑动机的解决方案,以产生竞争成果。
The manifold hypothesis suggests that word vectors live on a submanifold within their ambient vector space. We argue that we should, more accurately, expect them to live on a pinched manifold: a singular quotient of a manifold obtained by identifying some of its points. The identified, singular points correspond to polysemous words, i.e. words with multiple meanings. Our point of view suggests that monosemous and polysemous words can be distinguished based on the topology of their neighbourhoods. We present two kinds of empirical evidence to support this point of view: (1) We introduce a topological measure of polysemy based on persistent homology that correlates well with the actual number of meanings of a word. (2) We propose a simple, topologically motivated solution to the SemEval-2010 task on Word Sense Induction & Disambiguation that produces competitive results.