论文标题

与这样的虚假朋友,谁会注意到错误?

With False Friends Like These, Who Can Notice Mistakes?

论文作者

Tao, Lue, Feng, Lei, Yi, Jinfeng, Chen, Songcan

论文摘要

明确的对手制作的对抗性例子在机器学习中引起了极大的关注。但是,潜在的虚假朋友带来的安全风险在很大程度上被忽略了。在本文中,我们揭示了虚伪例子的威胁 - 最初错误分类但错误的朋友迫使正确预测的投入。尽管这种扰动的例子似乎无害,但我们首次指出,在评估期间,它们可能被恶意用来掩盖不合格的错误(即不如要求的不错)。一旦部署者信任虚伪的绩效并在现实世界应用中应用“良好表现”模型,即使在良性环境中,意外的失败也可能发生。更严重的是,这种安全风险似乎普遍存在:我们发现许多类型的不合格模型容易受到多个数据集中虚伪示例的影响。此外,我们提供了第一次以称为虚伪风险的度量来表征威胁的尝试,并试图通过几种对策对其进行规避。结果证明了对策的有效性,而在自适应训练后,风险仍然不可忽略。

Adversarial examples crafted by an explicit adversary have attracted significant attention in machine learning. However, the security risk posed by a potential false friend has been largely overlooked. In this paper, we unveil the threat of hypocritical examples -- inputs that are originally misclassified yet perturbed by a false friend to force correct predictions. While such perturbed examples seem harmless, we point out for the first time that they could be maliciously used to conceal the mistakes of a substandard (i.e., not as good as required) model during an evaluation. Once a deployer trusts the hypocritical performance and applies the "well-performed" model in real-world applications, unexpected failures may happen even in benign environments. More seriously, this security risk seems to be pervasive: we find that many types of substandard models are vulnerable to hypocritical examples across multiple datasets. Furthermore, we provide the first attempt to characterize the threat with a metric called hypocritical risk and try to circumvent it via several countermeasures. Results demonstrate the effectiveness of the countermeasures, while the risk remains non-negligible even after adaptive robust training.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源