事实律师：对事实验证系统的证据操纵攻击的分类法

论文标题

事实律师：对事实验证系统的证据操纵攻击的分类法

Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems

论文作者

Abdelnabi, Sahar, Fritz, Mario

论文摘要

错误和虚假信息是我们安全和安全的全球威胁。为了应对在线错误信息的规模，研究人员一直在通过检索和验证相关证据来自动进行事实检查。但是，尽管有很多进展，但仍缺乏对可能针对此类系统的攻击向量的全面评估。特别是，自动化事实验证过程可能容易受到它试图打击的确切虚假信息。在这项工作中，我们假设一个对手可以自动使用在线证据擦洗，以通过伪装相关证据或种植误导性的证据来破坏事实检查模型。我们首先提出了一种探索性分类法，该分类法涵盖了这两个目标和不同的威胁模型维度。在此的指导下，我们设计并提出了几种潜在的攻击方法。我们表明，有可能在证据中巧妙地修改索赔空位段，并产生多样化和索赔一致的证据。因此，在分类法的许多不同排列中，我们高度降低了事实检查的表现。这些攻击也对索赔后的事后修改也很强大。我们的分析进一步暗示了在面对矛盾的证据时，模型推断的潜在局限性。我们强调，这些攻击可能会对此类模型的可检查和人类使用情况产生有害的影响，我们通过讨论未来防御的挑战和方向来得出结论。

Mis- and disinformation are a substantial global threat to our security and safety. To cope with the scale of online misinformation, researchers have been working on automating fact-checking by retrieving and verifying against relevant evidence. However, despite many advances, a comprehensive evaluation of the possible attack vectors against such systems is still lacking. Particularly, the automated fact-verification process might be vulnerable to the exact disinformation campaigns it is trying to combat. In this work, we assume an adversary that automatically tampers with the online evidence in order to disrupt the fact-checking model via camouflaging the relevant evidence or planting a misleading one. We first propose an exploratory taxonomy that spans these two targets and the different threat model dimensions. Guided by this, we design and propose several potential attack methods. We show that it is possible to subtly modify claim-salient snippets in the evidence and generate diverse and claim-aligned evidence. Thus, we highly degrade the fact-checking performance under many different permutations of the taxonomy's dimensions. The attacks are also robust against post-hoc modifications of the claim. Our analysis further hints at potential limitations in models' inference when faced with contradicting evidence. We emphasize that these attacks can have harmful implications on the inspectable and human-in-the-loop usage scenarios of such models, and we conclude by discussing challenges and directions for future defenses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题