相似性检测管道，用于爬一个主题相关的假新闻语料库

论文标题

相似性检测管道，用于爬一个主题相关的假新闻语料库

Similarity Detection Pipeline for Crawling a Topic Related Fake News Corpus

论文作者

Vogel, Inna, Choi, Jeong-Eun, Meghana, Meghana

论文摘要

假新闻检测是一项艰巨的任务，旨在减少人类的时间和努力来检查新闻的真实性。但是，自动化的方法来打击虚假新闻，受到缺乏标记的基准数据集的限制，尤其是在英语以外的其他语言中。此外，许多公开可用的语料库都有具体的限制，使它们难以使用。为了解决这个问题，我们的贡献是三倍。首先，我们提出了一个新的，公开可用的德国主题与假新闻检测的语料库。据我们所知，这是同类的第一个语料库。在这方面，我们开发了一条爬行类似新闻文章的管道。作为我们的第三个贡献，我们进行了不同的学习实验来检测假新闻。最佳性能是使用Sbert与BI-LSTM结合使用的句子水平嵌入（k = 0.88）来实现的。

Fake news detection is a challenging task aiming to reduce human time and effort to check the truthfulness of news. Automated approaches to combat fake news, however, are limited by the lack of labeled benchmark datasets, especially in languages other than English. Moreover, many publicly available corpora have specific limitations that make them difficult to use. To address this problem, our contribution is threefold. First, we propose a new, publicly available German topic related corpus for fake news detection. To the best of our knowledge, this is the first corpus of its kind. In this regard, we developed a pipeline for crawling similar news articles. As our third contribution, we conduct different learning experiments to detect fake news. The best performance was achieved using sentence level embeddings from SBERT in combination with a Bi-LSTM (k=0.88).

下载PDF全文

下载文献需遵守相关版权规定

论文标题