写入风格的文档级事件提取

论文标题

写入风格的文档级事件提取

Writing Style Aware Document-level Event Extraction

论文作者

Xu, Zhuo, Wang, Yue, Bai, Lu, Cui, Lixin

论文摘要

旨在自动从文档获取结构信息的技术提取的技术在许多领域吸引了越来越多的关注。大多数现有作品通过将令牌作为不同的角色区分，同时忽略文档的写作方式，以将令牌区分开来讨论此问题。写作风格是一种特殊的文档内容组织方式，它在具有特殊字段的文档中相对固定（例如财务，医疗文件等）。我们认为，写作风格包含判断令牌角色的重要线索，对这种模式的无知可能会导致现有作品的绩效降级。为此，我们将文档中的写作样式建模为参数角色的分布，即角色级分布，并提出了一个具有基于角色级分布的监督机制的事件提取模型，以通过事件提取任务的监督培训过程来捕获这种模式。我们将模型与几个现实数据集中的最新方法进行比较。经验结果表明，我们的方法与被捕获的模式优于其他替代方案。这将验证写作样式包含有价值的信息，可以改善事件提取任务的性能。

Event extraction, the technology that aims to automatically get the structural information from documents, has attracted more and more attention in many fields. Most existing works discuss this issue with the token-level multi-label classification framework by distinguishing the tokens as different roles while ignoring the writing styles of documents. The writing style is a special way of content organizing for documents and it is relative fixed in documents with a special field (e.g. financial, medical documents, etc.). We argue that the writing style contains important clues for judging the roles for tokens and the ignorance of such patterns might lead to the performance degradation for the existing works. To this end, we model the writing style in documents as a distribution of argument roles, i.e., Role-Rank Distribution, and propose an event extraction model with the Role-Rank Distribution based Supervision Mechanism to capture this pattern through the supervised training process of an event extraction task. We compare our model with state-of-the-art methods on several real-world datasets. The empirical results show that our approach outperforms other alternatives with the captured patterns. This verifies the writing style contains valuable information that could improve the performance of the event extraction task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题