论文标题

使用字符n-grams在Twitter上检测假新闻播放器。 pan笔记本于CLEF 2020

Fake News Spreader Detection on Twitter using Character N-Grams. Notebook for PAN at CLEF 2020

论文作者

Vogel, Inna, Meghana, Meghana

论文摘要

虚假新闻的作者经常使用经过验证的新闻来源的事实,并将其与错误信息混合在一起,以造成混乱并引起读者的困惑。假新闻的传播可能会对我们的社会产生严重影响。他们可以摇摆政治选举,降低股价或破坏公司或公共人物的声誉。几个网站已执行检查谣言和指控的使命,但通常不够快,无法检查所有被传播的新闻的内容。特别是社交媒体网站为快速传播信息提供了一个简单的平台。为了将假新闻限制在社交媒体用户中传播,今年PAN 2020挑战的任务将重点放在了假新闻传播者上。该任务的目的是确定是否可以区分过去分享假新闻的作者与从未做过的人的作者。在此笔记本中,我们描述了我们在Twitter上的假新闻检测任务的分析系统。为此,我们从多语言的角度(即英语和西班牙语)进行了不同的特征提取技术和学习实验。我们的最终提交的系统使用字符n-grams与西班牙语的英语和逻辑回归的线性SVM结合使用。我们提交的模型分别在英语和西班牙官方测试套装上达到了73%和79%的总体准确性。我们的实验表明,很难将Twitter上的牢固假新闻播放器与共享可靠信息的用户区分开来,这些信息留出了进一步调查的空间。我们的模型在72名竞争对手中排名第三。

The authors of fake news often use facts from verified news sources and mix them with misinformation to create confusion and provoke unrest among the readers. The spread of fake news can thereby have serious implications on our society. They can sway political elections, push down the stock price or crush reputations of corporations or public figures. Several websites have taken on the mission of checking rumors and allegations, but are often not fast enough to check the content of all the news being disseminated. Especially social media websites have offered an easy platform for the fast propagation of information. Towards limiting fake news from being propagated among social media users, the task of this year's PAN 2020 challenge lays the focus on the fake news spreaders. The aim of the task is to determine whether it is possible to discriminate authors that have shared fake news in the past from those that have never done it. In this notebook, we describe our profiling system for the fake news detection task on Twitter. For this, we conduct different feature extraction techniques and learning experiments from a multilingual perspective, namely English and Spanish. Our final submitted systems use character n-grams as features in combination with a linear SVM for English and Logistic Regression for the Spanish language. Our submitted models achieve an overall accuracy of 73% and 79% on the English and Spanish official test set, respectively. Our experiments show that it is difficult to differentiate solidly fake news spreaders on Twitter from users who share credible information leaving room for further investigations. Our model ranked 3rd out of 72 competitors.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源