BPGC在Semeval-2020任务11：具有多个知识共享和基于语言特征的新闻文章中的宣传检测

论文标题

BPGC在Semeval-2020任务11：具有多个知识共享和基于语言特征的新闻文章中的宣传检测

BPGC at SemEval-2020 Task 11: Propaganda Detection in News Articles with Multi-Granularity Knowledge Sharing and Linguistic Features based Ensemble Learning

论文作者

Patil, Rajaswa, Singh, Somesh, Agarwal, Swati

论文摘要

宣传传播了志趣相投的人的意识形态和信念，洗脑了观众，有时会导致暴力。 Semeval 2020 Task-11旨在设计自动化系统以进行新闻宣传检测。任务11由两个子任务组成，即跨度标识 - 给定任何新闻文章，该系统标记了至少包含一种宣传技术的特定片段；和技术分类 - 在14条宣传技术中正确对给定的宣传陈述进行了正确分类。对于子任务1，我们使用从预先训练的变压器模型中提取的上下文嵌入式来表示各种粒度的文本数据，并提出一种多粒度知识共享方法。对于子任务2，我们使用具有语言特征的Bert和Logistic回归分类器的集合。我们的结果表明，语言特征是在高度不平衡数据集中涵盖少数群体的强大指标。

Propaganda spreads the ideology and beliefs of like-minded people, brainwashing their audiences, and sometimes leading to violence. SemEval 2020 Task-11 aims to design automated systems for news propaganda detection. Task-11 consists of two sub-tasks, namely, Span Identification - given any news article, the system tags those specific fragments which contain at least one propaganda technique; and Technique Classification - correctly classify a given propagandist statement amongst 14 propaganda techniques. For sub-task 1, we use contextual embeddings extracted from pre-trained transformer models to represent the text data at various granularities and propose a multi-granularity knowledge sharing approach. For sub-task 2, we use an ensemble of BERT and logistic regression classifiers with linguistic features. Our results reveal that the linguistic features are the strong indicators for covering minority classes in a highly imbalanced dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题