通过观察运动，对抗性强大的视频感知

论文标题

通过观察运动，对抗性强大的视频感知

Adversarially Robust Video Perception by Seeing Motion

论文作者

Zhang, Lingyu, Mao, Chengzhi, Yang, Junfeng, Vondrick, Carl

论文摘要

尽管表现出色，但最先进的计算机视觉模型在遇到对抗性示例时通常会失败。视频感知模型在攻击下往往更加脆弱，因为对手有更多在高维数据中操纵的地方。在本文中，我们发现视频模型的脆弱性的一个原因是，它们无法在对抗性扰动下感知正确的运动。受到广泛证据表明运动是人类视觉系统的关键因素的启发，我们建议通过恢复感知的运动信息来纠正模型所看到的内容。由于运动信息是视频数据的内在结构，因此可以在推理时间进行恢复运动信号，而无需任何人类注释，这使该模型可以适应不可预见的最坏情况输入。 UCF-101和HMDB-51数据集的可视化和经验实验表明，在深视觉模型中恢复运动信息可改善对抗性鲁棒性。即使在对手知道我们的防御的自适应攻击下，我们的算法仍然有效。我们的工作通过使用来自数据的固有结构，为鲁棒视频感知算法提供了新的见解。我们的网页可在https://motion4robust.cs.cs.columbia.edu上找到。

Despite their excellent performance, state-of-the-art computer vision models often fail when they encounter adversarial examples. Video perception models tend to be more fragile under attacks, because the adversary has more places to manipulate in high-dimensional data. In this paper, we find one reason for video models' vulnerability is that they fail to perceive the correct motion under adversarial perturbations. Inspired by the extensive evidence that motion is a key factor for the human visual system, we propose to correct what the model sees by restoring the perceived motion information. Since motion information is an intrinsic structure of the video data, recovering motion signals can be done at inference time without any human annotation, which allows the model to adapt to unforeseen, worst-case inputs. Visualizations and empirical experiments on UCF-101 and HMDB-51 datasets show that restoring motion information in deep vision models improves adversarial robustness. Even under adaptive attacks where the adversary knows our defense, our algorithm is still effective. Our work provides new insight into robust video perception algorithms by using intrinsic structures from the data. Our webpage is available at https://motion4robust.cs.columbia.edu.

下载PDF全文

下载文献需遵守相关版权规定

论文标题