使用层次联合分解的半监督多相关检测错误信息

论文标题

使用层次联合分解的半监督多相关检测错误信息

Semi-Supervised Multi-aspect Detection of Misinformation using Hierarchical Joint Decomposition

论文作者

Abdali, Sara, Shah, Neil, Papalexakis, Evangelos E.

论文摘要

区分错误信息和真实信息是当今相互联系的世界中最具挑战性的问题之一。检测错误信息的绝大多数最先进的是完全监督的，需要大量高质量的人类注释。但是，这种注释的可用性不能理所当然，因为它非常昂贵，耗时和挑战，以与错误信息的扩散保持一致。在这项工作中，我们有兴趣探索注释数量有限的方案。在这种情况下，我们研究如何利用描述新闻文章的各种资源，此后称为“方面”可以弥补缺乏标签的资源。特别是，我们在本文中的贡献是双重的：1）我们提出了三个不同方面的使用：文章内容，社交共享行为的背景以及主机网站/域特征，以及2）我们介绍了一个基于张量的嵌入式嵌入框架，以有效地结合所有这些方面。我们提出了一个2级分解管道，不仅在Twitter和Politifact数据集中分别超过了F1得分为74％和81％的最先进方法，而且比相似的集合方法更快。

Distinguishing between misinformation and real information is one of the most challenging problems in today's interconnected world. The vast majority of the state-of-the-art in detecting misinformation is fully supervised, requiring a large number of high-quality human annotations. However, the availability of such annotations cannot be taken for granted, since it is very costly, time-consuming, and challenging to do so in a way that keeps up with the proliferation of misinformation. In this work, we are interested in exploring scenarios where the number of annotations is limited. In such scenarios, we investigate how tapping on a diverse number of resources that characterize a news article, henceforth referred to as "aspects" can compensate for the lack of labels. In particular, our contributions in this paper are twofold: 1) We propose the use of three different aspects: article content, context of social sharing behaviors, and host website/domain features, and 2) We introduce a principled tensor based embedding framework that combines all those aspects effectively. We propose HiJoD a 2-level decomposition pipeline which not only outperforms state-of-the-art methods with F1-scores of 74% and 81% on Twitter and Politifact datasets respectively but also is an order of magnitude faster than similar ensemble approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题