基于变压器自动识别参数关系的模型：跨域评估

论文标题

基于变压器自动识别参数关系的模型：跨域评估

Transformer-Based Models for Automatic Identification of Argument Relations: A Cross-Domain Evaluation

论文作者

Ruiz-Dolz, Ramon, Heras, Stella, Alemany, Jose, García-Fornes, Ana

论文摘要

参数挖掘被定义为自动识别和提取论证组件（例如，前提，索赔等）的任务，并检测它们之间的现有关系（即支持，攻击，重塑，没有关系）。解决此问题的主要问题之一是缺乏数据，以及公开可用的语料库的大小。在这项工作中，我们使用最近注释的US2016辩论语料库。 US2016是最大的现有参数注释语料库，它允许在复杂领域（例如参数（关系）挖掘）中探索自然语言处理的最新进展的好处。我们对基于变压器的模型（即Bert，Xlnet，Roberta，Distilbert和Albert）的行为进行了详尽的分析。最后，我们评估了五个不同领域的模型，目的是找到较小的域依赖模型。我们以US2016评估语料库获得了0.70的宏F1评分，并且具有道德迷宫跨域语料库的宏F1得分为0.61。

Argument Mining is defined as the task of automatically identifying and extracting argumentative components (e.g., premises, claims, etc.) and detecting the existing relations among them (i.e., support, attack, rephrase, no relation). One of the main issues when approaching this problem is the lack of data, and the size of the publicly available corpora. In this work, we use the recently annotated US2016 debate corpus. US2016 is the largest existing argument annotated corpus, which allows exploring the benefits of the most recent advances in Natural Language Processing in a complex domain like Argument (relation) Mining. We present an exhaustive analysis of the behavior of transformer-based models (i.e., BERT, XLNET, RoBERTa, DistilBERT and ALBERT) when predicting argument relations. Finally, we evaluate the models in five different domains, with the objective of finding the less domain dependent model. We obtain a macro F1-score of 0.70 with the US2016 evaluation corpus, and a macro F1-score of 0.61 with the Moral Maze cross-domain corpus.

下载PDF全文

下载文献需遵守相关版权规定

论文标题