论文标题
对抗训练可降低信息并提高可转移性
Adversarial Training Reduces Information and Improves Transferability
论文作者
论文摘要
最近的结果表明,除了强大的范围外,还具有诸如可逆性之类的理想属性外,经过对抗训练的网络进行分类的特征。后一种属性似乎是违反直觉的,因为社区被广泛接受,分类模型只能捕获任务所需的最小信息(功能)。在这种差异的推动下,我们研究了对抗训练与信息理论之间的双重关系。我们表明,对抗性训练可以提高对新任务的线性可传递性,从而在源代表的可传递性和源任务准确性之间产生了新的权衡。我们使用在几个数据集上使用CIFAR-10,CIFAR-100和IMAGENET进行培训的强大网络来验证我们的结果。此外,我们表明,对抗性训练减少了有关任务输入和权重的表示的费舍尔信息,并且我们提供了一个理论论点,该论点解释了确定性网络的不可逆转性而不会违反最小值的原则。最后,我们利用理论见解来显着提高重建图像的质量。
Recent results show that features of adversarially trained networks for classification, in addition to being robust, enable desirable properties such as invertibility. The latter property may seem counter-intuitive as it is widely accepted by the community that classification models should only capture the minimal information (features) required for the task. Motivated by this discrepancy, we investigate the dual relationship between Adversarial Training and Information Theory. We show that the Adversarial Training can improve linear transferability to new tasks, from which arises a new trade-off between transferability of representations and accuracy on the source task. We validate our results employing robust networks trained on CIFAR-10, CIFAR-100 and ImageNet on several datasets. Moreover, we show that Adversarial Training reduces Fisher information of representations about the input and of the weights about the task, and we provide a theoretical argument which explains the invertibility of deterministic networks without violating the principle of minimality. Finally, we leverage our theoretical insights to remarkably improve the quality of reconstructed images through inversion.