论文标题
通过模型检查对深入加强学习政策的有针对性的对抗性攻击
Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking
论文作者
论文摘要
深度加固学习(RL)代理在观察结果中容易受到对抗噪声的影响,这些噪声可能会误导其政策并降低其性能。但是,对手可能不仅有兴趣减少奖励,而且对修改策略的特定时间逻辑属性感兴趣。本文提出了一个指标,该指标衡量了对抗攻击对此类特性的确切影响。我们使用此指标来制作最佳的对抗性攻击。此外,我们引入了一种模型检查方法,该方法使我们能够验证RL政策针对对抗攻击的鲁棒性。我们的经验分析证实(1)我们的指标质量质量针对时间逻辑属性进行对抗性攻击,以及(2)我们能够简化评估系统对攻击的鲁棒性。
Deep Reinforcement Learning (RL) agents are susceptible to adversarial noise in their observations that can mislead their policies and decrease their performance. However, an adversary may be interested not only in decreasing the reward, but also in modifying specific temporal logic properties of the policy. This paper presents a metric that measures the exact impact of adversarial attacks against such properties. We use this metric to craft optimal adversarial attacks. Furthermore, we introduce a model checking method that allows us to verify the robustness of RL policies against adversarial attacks. Our empirical analysis confirms (1) the quality of our metric to craft adversarial attacks against temporal logic properties, and (2) that we are able to concisely assess a system's robustness against attacks.