论文标题
语法树是否有助于预先训练的变压器提取信息?
Do Syntax Trees Help Pre-trained Transformers Extract Information?
论文作者
论文摘要
最近的许多工作表明,从依赖树中纳入语法信息可以改善特定于任务的变压器模型。但是,将依赖树信息纳入预训练的变压器模型(例如BERT)的效果尚不清楚,特别是考虑到最近的研究强调了这些模型如何隐式编码语法。在这项工作中,我们系统地研究了将依赖树纳入三个代表性信息提取任务的预训练的变压器中的实用性:语义角色标记(SRL),命名实体识别和关系提取。 我们提出并研究了依赖性结构的两种不同的策略:一种晚期融合方法,该方法将图形神经网络应用于变压器的输出以及一种关节融合方法,并将语法结构注入变压器注意层中。这些策略代表了先前的工作,但是我们引入了其他模型设计元素,这些元素是获得改进性能所需的。我们的经验分析表明,这些注入的语法变压器在SRL和关系提取任务上获得最新的结果。但是,我们的分析还揭示了这些模型的重要缺点:我们发现它们的性能提高与人类通知依赖关系的可用性高度依赖,这引发了有关现实世界应用中语法增强变压器可行性的重要问题。
Much recent work suggests that incorporating syntax information from dependency trees can improve task-specific transformer models. However, the effect of incorporating dependency tree information into pre-trained transformer models (e.g., BERT) remains unclear, especially given recent studies highlighting how these models implicitly encode syntax. In this work, we systematically study the utility of incorporating dependency trees into pre-trained transformers on three representative information extraction tasks: semantic role labeling (SRL), named entity recognition, and relation extraction. We propose and investigate two distinct strategies for incorporating dependency structure: a late fusion approach, which applies a graph neural network on the output of a transformer, and a joint fusion approach, which infuses syntax structure into the transformer attention layers. These strategies are representative of prior work, but we introduce additional model design elements that are necessary for obtaining improved performance. Our empirical analysis demonstrates that these syntax-infused transformers obtain state-of-the-art results on SRL and relation extraction tasks. However, our analysis also reveals a critical shortcoming of these models: we find that their performance gains are highly contingent on the availability of human-annotated dependency parses, which raises important questions regarding the viability of syntax-augmented transformers in real-world applications.