Prada：实用的黑盒对抗性攻击神经排名模型

论文标题

Prada：实用的黑盒对抗性攻击神经排名模型

PRADA: Practical Black-Box Adversarial Attacks against Neural Ranking Models

论文作者

Wu, Chen, Zhang, Ruqing, Guo, Jiafeng, de Rijke, Maarten, Fan, Yixing, Cheng, Xueqi

论文摘要

近年来，神经排名模型（NRMS）取得了巨大的成功，尤其是在训练有素的语言模型中。但是，深层神经模型因其易受对抗性例子的脆弱性而臭名昭著。鉴于我们对神经信息检索模型的依赖，对抗攻击可能成为一种新型的Web垃圾邮件垃圾邮件技术。因此，重要的是研究潜在的对抗性攻击，以在部署NRMS之前识别它们的漏洞。在本文中，我们介绍了针对NRMS的替换排名攻击（WSRA）任务，该任务旨在通过在其文本中添加对抗性扰动来促进排名中的目标文档。我们专注于基于决策的黑框攻击设置，攻击者无法直接访问模型信息，但只能查询目标模型以获取部分检索到的列表的排名位置。这种攻击设置在现实世界搜索引擎中是现实的。我们提出了一种基于伪相关性的新型对抗排名攻击方法（PRADA），该方法基于伪相关性反馈（PRF）学习了替代模型，以生成寻找对抗性扰动的梯度。两个Web搜索基准数据集上的实验表明，Prada可以胜过现有的攻击策略，并成功地使用文本的小小的扰动来欺骗NRM。

Neural ranking models (NRMs) have shown remarkable success in recent years, especially with pre-trained language models. However, deep neural models are notorious for their vulnerability to adversarial examples. Adversarial attacks may become a new type of web spamming technique given our increased reliance on neural information retrieval models. Therefore, it is important to study potential adversarial attacks to identify vulnerabilities of NRMs before they are deployed. In this paper, we introduce the Word Substitution Ranking Attack (WSRA) task against NRMs, which aims to promote a target document in rankings by adding adversarial perturbations to its text. We focus on the decision-based black-box attack setting, where the attackers cannot directly get access to the model information, but can only query the target model to obtain the rank positions of the partial retrieved list. This attack setting is realistic in real-world search engines. We propose a novel Pseudo Relevance-based ADversarial ranking Attack method (PRADA) that learns a surrogate model based on Pseudo Relevance Feedback (PRF) to generate gradients for finding the adversarial perturbations. Experiments on two web search benchmark datasets show that PRADA can outperform existing attack strategies and successfully fool the NRM with small indiscernible perturbations of text.

下载PDF全文

下载文献需遵守相关版权规定

论文标题