论文标题
通过解释性防御图形网络上的后门攻击
Defending Against Backdoor Attack on Graph Nerual Network by Explainability
论文作者
论文摘要
后门攻击是对深度学习模型的强大攻击算法。最近,GNN对后门攻击的脆弱性已被证明,尤其是在图形分类任务上。在本文中,我们提出了GNN上的第一种后门检测和防御方法。大多数后门攻击取决于向干净样品注入小但有影响力的扳机。对于图数据,当前的后门攻击专注于操纵图形结构以注入触发器。我们发现,在一些解释性评估指标(例如保真度和不忠行为)中,良性样本和恶意样本之间存在明显差异。在确定了恶意样本后,GNN模型的解释性可以帮助我们捕获最重要的子图,这可能是Trojan图中的触发器。我们使用各种数据集和不同的攻击设置来证明我们的防御方法的有效性。攻击成功率的所有事实都大大降低。
Backdoor attack is a powerful attack algorithm to deep learning model. Recently, GNN's vulnerability to backdoor attack has been proved especially on graph classification task. In this paper, we propose the first backdoor detection and defense method on GNN. Most backdoor attack depends on injecting small but influential trigger to the clean sample. For graph data, current backdoor attack focus on manipulating the graph structure to inject the trigger. We find that there are apparent differences between benign samples and malicious samples in some explanatory evaluation metrics, such as fidelity and infidelity. After identifying the malicious sample, the explainability of the GNN model can help us capture the most significant subgraph which is probably the trigger in a trojan graph. We use various dataset and different attack settings to prove the effectiveness of our defense method. The attack success rate all turns out to decrease considerably.