论文标题

自然语言的上下文理论语义:代数框架

Context-theoretic Semantics for Natural Language: an Algebraic Framework

论文作者

Clarke, Daoud

论文摘要

事实证明,在计算语言学的许多应用中,将单词表示为向量的技术已被证明是有用的,但是目前尚无一般的语义形式主义来代表向量的含义。我们为自然语言语义提出了一个框架,其中基于理论分析,其中的单词,短语和句子都表示为向量,该理论分析假定含义是由上下文确定的。 在理论分析中,我们将语料库模型定义为文本语料库的数学抽象。假定一串单词的含义是代表其在语料库模型中发生的上下文的向量。基于这个假设,我们可以证明单词的向量表示可以视为字段上代数的元素。我们注意到,在矢量空间的应用中,在表示单词的含义中,有一个基本的晶格结构。我们将晶格的部分顺序解释为描述含义之间的必要。我们还定义了字符串的上下文理论概率,并基于此和晶格结构,在字符串之间一定程度。 这些属性共同构成了有关如何在框架中构建语义表示的准则。上下文理论是框架的实现。在实现中,字符串表示为从理论分析得出的属性的向量。 我们展示了如何将逻辑语义纳入上下文理论;这使我们能够通过获得单个表示的加权总和来表示有关不确定性的统计信息。我们还使用该框架来分析识别文本需要的任务,含义的本体论表示和代表句法结构的任务。对于后者,我们提供了链接语法的新代数描述。

Techniques in which words are represented as vectors have proved useful in many applications in computational linguistics, however there is currently no general semantic formalism for representing meaning in terms of vectors. We present a framework for natural language semantics in which words, phrases and sentences are all represented as vectors, based on a theoretical analysis which assumes that meaning is determined by context. In the theoretical analysis, we define a corpus model as a mathematical abstraction of a text corpus. The meaning of a string of words is assumed to be a vector representing the contexts it occurs in in the corpus model. Based on this assumption, we can show that the vector representations of words can be considered as elements of an algebra over a field. We note that in applications of vector spaces to representing meanings of words there is an underlying lattice structure; we interpret the partial ordering of the lattice as describing entailment between meanings. We also define the context-theoretic probability of a string, and, based on this and the lattice structure, a degree of entailment between strings. Together these properties form guidelines as to how to construct semantic representations within the framework. A context theory is an implementation of the framework; in an implementation strings are represented as vectors with the properties deduced from the theoretical analysis. We show how to incorporate logical semantics into context theories; this enables us to represent statistical information about uncertainty by taking weighted sums of individual representations. We also use the framework to analyse approaches to the task of recognising textual entailment, to ontological representations of meaning and to representing syntactic structure. For the latter, we give new algebraic descriptions of link grammar.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源