通过穿越嵌入式世界，基于梯度的对抗攻击对分类序列模型

论文标题

通过穿越嵌入式世界，基于梯度的对抗攻击对分类序列模型

Gradient-based adversarial attacks on categorical sequence models via traversing an embedded world

论文作者

Fursov, Ivan, Zaytsev, Alexey, Kluchnikov, Nikita, Kravchenko, Andrey, Burnaev, Evgeny

论文摘要

深度学习模型遭受了一种称为对抗性攻击的现象：我们可以对模型输入进行较小的更改，以欺骗分类器作为特定示例。文献主要考虑对具有图像和其他结构化输入的模型对对抗性攻击。但是，分类序列的对抗性攻击也可能有害。以分类序列形式的输入的成功攻击应解决以下挑战：（1）目标函数的非差异性，（2）对初始序列转换的限制，以及（3）可能问题的多样性。我们使用两次Black-Box对抗攻击来应对这些挑战。第一种方法采用了蒙特卡洛方法，并在任何情况下都允许使用量，第二种方法使用模型和目标指标的连续放松，因此允许使用最新的方法来进行对抗攻击，而又几乎没有付出额外的努力。货币交易，医疗欺诈和NLP数据集的结果表明，提出的方法会产生合理的对抗序列，这些序列接近原始的序列，但愚蠢的机器学习模型。

Deep learning models suffer from a phenomenon called adversarial attacks: we can apply minor changes to the model input to fool a classifier for a particular example. The literature mostly considers adversarial attacks on models with images and other structured inputs. However, the adversarial attacks for categorical sequences can also be harmful. Successful attacks for inputs in the form of categorical sequences should address the following challenges: (1) non-differentiability of the target function, (2) constraints on transformations of initial sequences, and (3) diversity of possible problems. We handle these challenges using two black-box adversarial attacks. The first approach adopts a Monte-Carlo method and allows usage in any scenario, the second approach uses a continuous relaxation of models and target metrics, and thus allows usage of state-of-the-art methods for adversarial attacks with little additional effort. Results for money transactions, medical fraud, and NLP datasets suggest that proposed methods generate reasonable adversarial sequences that are close to original ones but fool machine learning models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题