论文标题
RUARG-2022:参数挖掘评估
RuArg-2022: Argument Mining Evaluation
论文作者
论文摘要
论证分析是一个计算语言学领域,该领域研究了从文本及其之间的关系中提取参数的方法,以及文本的论证结构。本文是组织者关于在对话会议框架内涉及俄罗斯语言文本的第一个论证分析系统竞争的报告。在比赛期间,参与者获得了两项任务:立场检测和论证分类。准备了一个与Covid-19的大流行有关的三个主题(疫苗接种,隔离和戴口罩)的三个主题的语料库(在社交媒体帖子上发表评论),并进行了注释,并用于培训和测试。在两个任务中赢得第一名的系统使用了BERT体系结构的NLI(自然语言推理)变体,将自动翻译成英文以应用专业的BERT模型,在Twitter帖子上进行了讨论COVID-19,以及目标实体的其他掩盖。该系统显示以下结果:对于立场检测任务,F1得分为0.6968,对于参数分类任务,F1得分为0.7404。我们希望准备好的数据集和基线将有助于进一步研究俄罗斯语言的论证开采。
Argumentation analysis is a field of computational linguistics that studies methods for extracting arguments from texts and the relationships between them, as well as building argumentation structure of texts. This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts within the framework of the Dialogue conference. During the competition, the participants were offered two tasks: stance detection and argument classification. A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic (vaccination, quarantine, and wearing masks) was prepared, annotated, and used for training and testing. The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture, automatic translation into English to apply a specialized BERT model, retrained on Twitter posts discussing COVID-19, as well as additional masking of target entities. This system showed the following results: for the stance detection task an F1-score of 0.6968, for the argument classification task an F1-score of 0.7404. We hope that the prepared dataset and baselines will help to foster further research on argument mining for the Russian language.