论文标题

土耳其话语银行1.2的描述以及对土耳其话语中共同依赖的检查

A description of Turkish Discourse Bank 1.2 and an examination of common dependencies in Turkish discourse

论文作者

Zeyrek, Deniz, Er, Mustafa Erolcan

论文摘要

我们描述了土耳其话语银行1.2,这是一个最新版本的话语语料库,该语料库是针对明确或隐式传达的话语关系,其本构单元以及宾夕法尼亚州话语Treebank风格的感知的。我们对最近添加的令牌进行了评估,并检查了三种通常发生的依赖模式,这些模式在一对相邻的话语关系的本构单元之间存在,即共同的参数,完整的嵌入和对话语关系的部分遏制。我们提出了三个主要发现:(a)隐式传达的关系发生的频率远远超过数据中的明确传达的关系; (b)两个相邻的隐式话语关系比对于两个相邻的显式关系,更为普遍; (c)在语料库中,完整的嵌入和部分围绕话语关系都是普遍存在的,这可能部分是由于下属连接剂,其预授予的下属子句倾向于与矩阵子句一起选择,而不是单独选择。最后,我们简要讨论了我们发现对土耳其话语解析的含义。

We describe Turkish Discourse Bank 1.2, the latest version of a discourse corpus annotated for explicitly or implicitly conveyed discourse relations, their constitutive units, and senses in the Penn Discourse Treebank style. We present an evaluation of the recently added tokens and examine three commonly occurring dependency patterns that hold among the constitutive units of a pair of adjacent discourse relations, namely, shared arguments, full embedding and partial containment of a discourse relation. We present three major findings: (a) implicitly conveyed relations occur more often than explicitly conveyed relations in the data; (b) it is much more common for two adjacent implicit discourse relations to share an argument than for two adjacent explicit relations to do so; (c) both full embedding and partial containment of discourse relations are pervasive in the corpus, which can be partly due to subordinator connectives whose preposed subordinate clause tends to be selected together with the matrix clause rather than being selected alone. Finally, we briefly discuss the implications of our findings for Turkish discourse parsing.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源