对神经排名模型的单词替换排名攻击的认证鲁棒性

论文标题

对神经排名模型的单词替换排名攻击的认证鲁棒性

Certified Robustness to Word Substitution Ranking Attack for Neural Ranking Models

论文作者

Wu, Chen, Zhang, Ruqing, Guo, Jiafeng, Chen, Wei, Fan, Yixing, de Rijke, Maarten, Cheng, Xueqi

论文摘要

神经排名模型（NRMS）在信息检索中取得了有希望的结果。 NRM也已被证明容易受到对抗示例的影响。最近提出了一个针对NRMS的典型词替代排名攻击（WSRA），其中攻击者通过在其文本中添加可察觉的人类侵蚀性扰动来促进排名中的目标文档。在现实世界应用程序中部署NRM时，这引起了人们的关注。因此，重要的是开发防御这种NRMS攻击的技术。在经验防御中，会在训练期间发现对抗性例子，并用于增加训练集。但是，这种方法对模型的鲁棒性没有理论保证，最终可能会被其他复杂的WSRA打破。为了逃避这场武器竞赛，需要为NRMS进行严格且可证明的认证防御方法。为此，我们首先定义\ textIt {认证的top-$ k $ robustness}，用于对模型进行排名，因为用户主要关心在现实世界中排名最高的结果。据说，当保证在任何攻击下将其远离上层$ k $的文档远离最高$ k $的文件时，排名模型在排名列表中获得了最高$ K $的认证。然后，我们基于随机平滑的想法，引入了一种名为CERTDR的认证防御方法，以实现对WSRA的认证最高稳定性。具体来说，我们首先通过在文档上应用随机单词替换来构建平滑的排名，然后用合奏的统计属性共同利用排名属性，以证明可以证明top-$ k $ robustness。在两个代表性的Web搜索数据集上进行的广泛实验表明，CERTDR可以大大优于排名模型的最先进的经验防御方法。

Neural ranking models (NRMs) have achieved promising results in information retrieval. NRMs have also been shown to be vulnerable to adversarial examples. A typical Word Substitution Ranking Attack (WSRA) against NRMs was proposed recently, in which an attacker promotes a target document in rankings by adding human-imperceptible perturbations to its text. This raises concerns when deploying NRMs in real-world applications. Therefore, it is important to develop techniques that defend against such attacks for NRMs. In empirical defenses adversarial examples are found during training and used to augment the training set. However, such methods offer no theoretical guarantee on the models' robustness and may eventually be broken by other sophisticated WSRAs. To escape this arms race, rigorous and provable certified defense methods for NRMs are needed. To this end, we first define the \textit{Certified Top-$K$ Robustness} for ranking models since users mainly care about the top ranked results in real-world scenarios. A ranking model is said to be Certified Top-$K$ Robust on a ranked list when it is guaranteed to keep documents that are out of the top $K$ away from the top $K$ under any attack. Then, we introduce a Certified Defense method, named CertDR, to achieve certified top-$K$ robustness against WSRA, based on the idea of randomized smoothing. Specifically, we first construct a smoothed ranker by applying random word substitutions on the documents, and then leverage the ranking property jointly with the statistical property of the ensemble to provably certify top-$K$ robustness. Extensive experiments on two representative web search datasets demonstrate that CertDR can significantly outperform state-of-the-art empirical defense methods for ranking models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题