evalda：对潜在迪里奇莱特分配的有效逃避攻击

论文标题

evalda：对潜在迪里奇莱特分配的有效逃避攻击

EvaLDA: Efficient Evasion Attacks Towards Latent Dirichlet Allocation

论文作者

Zhou, Qi, Chen, Haipeng, Zheng, Yitao, Wang, Zhen

论文摘要

作为最强大的主题模型之一，潜在的Dirichlet分配（LDA）已在各种任务中使用，包括文档理解，信息检索和同行评审器分配。尽管很少受欢迎，但LDA的安全很少被研究。这给关键安全任务带来了严重的风险，例如基于LDA的情感分析和同行评审分配。在本文中，我们有兴趣知道LDA模型是否容易受到推理期间良性文档示例的对抗性扰动的影响。我们将逃避对LDA模型的逃避攻击形式化为一个优化问题，并证明它是NP-HARD。然后，我们提出了一种新颖有效的算法，以解决它。我们通过广泛的经验评估来展示evalda的有效性。例如，在NIPS数据集中，Evalda可以通过仅在受害者文档中用相似单词替换1％的单词来平均将目标主题的等级从10升至7。我们的工作为逃避攻击对LDA模型的力量和局限性提供了重大见解。

As one of the most powerful topic models, Latent Dirichlet Allocation (LDA) has been used in a vast range of tasks, including document understanding, information retrieval and peer-reviewer assignment. Despite its tremendous popularity, the security of LDA has rarely been studied. This poses severe risks to security-critical tasks such as sentiment analysis and peer-reviewer assignment that are based on LDA. In this paper, we are interested in knowing whether LDA models are vulnerable to adversarial perturbations of benign document examples during inference time. We formalize the evasion attack to LDA models as an optimization problem and prove it to be NP-hard. We then propose a novel and efficient algorithm, EvaLDA to solve it. We show the effectiveness of EvaLDA via extensive empirical evaluations. For instance, in the NIPS dataset, EvaLDA can averagely promote the rank of a target topic from 10 to around 7 by only replacing 1% of the words with similar words in a victim document. Our work provides significant insights into the power and limitations of evasion attacks to LDA models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题