Fakecovid-多语言跨域事实检查COVID-19的新闻数据集

论文标题

Fakecovid-多语言跨域事实检查COVID-19的新闻数据集

FakeCovid -- A Multilingual Cross-domain Fact Check News Dataset for COVID-19

论文作者

Shahi, Gautam Kishore, Nandini, Durgesh

论文摘要

在本文中，我们提出了第一个多语言的跨域数据集，该数据集是5182个事实检查的COVID-19，从04/01/2020到15/05/2020收集。在获得Poynter和Snopes的参考文献之后，我们从92个不同的事实检查网站中收集了事实检查的文章。根据其内容，我们将文章手动注释分为事实核对新闻的11种不同类别。该数据集使用105个国家 /地区的40种语言。我们已经建立了一个分类器来检测假新闻，并为自动假新闻检测及其班级提供结果。我们的模型达到了0.76的F1分数，以检测错误类别和其他事实检查文章。 Fakecovid数据集可在GitHub上找到。

In this paper, we present a first multilingual cross-domain dataset of 5182 fact-checked news articles for COVID-19, collected from 04/01/2020 to 15/05/2020. We have collected the fact-checked articles from 92 different fact-checking websites after obtaining references from Poynter and Snopes. We have manually annotated articles into 11 different categories of the fact-checked news according to their content. The dataset is in 40 languages from 105 countries. We have built a classifier to detect fake news and present results for the automatic fake news detection and its class. Our model achieves an F1 score of 0.76 to detect the false class and other fact check articles. The FakeCovid dataset is available at Github.

下载PDF全文

下载文献需遵守相关版权规定

论文标题