论文标题
Baaan:针对自动编码器和基于GAN的机器学习模型的后门攻击
BAAAN: Backdoor Attacks Against Autoencoder and GAN-Based Machine Learning Models
论文作者
论文摘要
自动编码器和生成对抗网络(GAN)的巨大进展已导致其应用于多个关键任务,例如欺诈检测和消毒数据生成。这种越来越多的采用促进了对这些模型所引起的安全性和隐私风险的研究。但是,以前的工作主要集中在会员推理攻击上。在这项工作中,我们探讨了针对机器学习模型的最严重攻击之一,即对自动编码器和甘恩斯的后门攻击。后门攻击是一种训练时间攻击,对手在目标模型中实现隐藏的后门,该后门只能通过秘密触发器激活。最新的后门攻击集中于基于分类的任务。我们将后门攻击的适用性扩展到自动编码器和基于GAN的模型。更具体地说,我们提出了第一次针对自动编码器和gan的后门攻击,当对手可以在激活后门时控制解码或生成的图像是什么。我们的结果表明,对手可以构建一个后部的自动编码器,该自动编码器返回所有后门输入的目标输出,同时在干净的输入上表现完全正常。同样,对于gans,我们的实验表明,在激活后门时,对手可以从不同的分布中生成数据,同时在后门没有时保持相同的实用程序。
The tremendous progress of autoencoders and generative adversarial networks (GANs) has led to their application to multiple critical tasks, such as fraud detection and sanitized data generation. This increasing adoption has fostered the study of security and privacy risks stemming from these models. However, previous works have mainly focused on membership inference attacks. In this work, we explore one of the most severe attacks against machine learning models, namely the backdoor attack, against both autoencoders and GANs. The backdoor attack is a training time attack where the adversary implements a hidden backdoor in the target model that can only be activated by a secret trigger. State-of-the-art backdoor attacks focus on classification-based tasks. We extend the applicability of backdoor attacks to autoencoders and GAN-based models. More concretely, we propose the first backdoor attack against autoencoders and GANs where the adversary can control what the decoded or generated images are when the backdoor is activated. Our results show that the adversary can build a backdoored autoencoder that returns a target output for all backdoored inputs, while behaving perfectly normal on clean inputs. Similarly, for the GANs, our experiments show that the adversary can generate data from a different distribution when the backdoor is activated, while maintaining the same utility when the backdoor is not.