论文标题

基于短语窗口的注释规则和识别算法的研究

Research on Annotation Rules and Recognition Algorithm Based on Phrase Window

论文作者

Liu, Guang, Tu, Gang, Li, Zheng, Liu, Yi-Jian

论文摘要

目前,大多数自然语言处理技术基于依赖性解析的单词分割结果,该结果主要使用基于监督学习的端到端方法。这种方法有两个主要问题:首先,LA贝林规则很复杂,数据太难标记,其工作量很大;其次,该算法无法识别语言组成部分的多范围和多样性。为了解决这两个问题,我们提出了基于短语窗口的标签规则,并设计了相应的短语识别算法。标记规则将短语用作最小单位,将句子置于7种类型的嵌套短语类型中,并标记了短语之间的语法依赖性。相应的算法借鉴了识别图像字段中的目标区域的想法,可以找到句子中各种短语的开始和端位置,并实现对嵌套短语和语法依赖性的同步识别。实验的结果表明,标签规则很方便且易于使用,并且没有歧义。该算法在语法上比端到端算法更具语法多粒和多样性。 CPWD数据集的实验提高了端到端方法的准确性约1分。相应的方法应用于CCL2018竞赛,这是中国隐喻分析任务的第一名。

At present, most Natural Language Processing technology is based on the results of Word Segmentation for Dependency Parsing, which mainly uses an end-to-end method based on supervised learning. There are two main problems with this method: firstly, the la-beling rules are complex and the data is too difficult to label, the workload of which is large; secondly, the algorithm cannot recognize the multi-granularity and diversity of language components. In order to solve these two problems, we propose labeling rules based on phrase windows, and designed corresponding phrase recognition algorithms. The labeling rule uses phrases as the minimum unit, di-vides sentences into 7 types of nestable phrase types, and marks the grammatical dependencies between phrases. The corresponding algorithm, drawing on the idea of identifying the target area in the image field, can find the start and end positions of various phrases in the sentence, and realize the synchronous recognition of nested phrases and grammatical dependencies. The results of the experiment shows that the labeling rule is convenient and easy to use, and there is no ambiguity; the algorithm is more grammatically multi-granular and diverse than the end-to-end algorithm. Experiments on the CPWD dataset improve the accuracy of the end-to-end method by about 1 point. The corresponding method was applied to the CCL2018 competition, and the first place in the Chinese Metaphor Sentiment Analysis Task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源