稀疏两次获胜：从更高效的培训中获得更好的鲁棒性概括

论文标题

稀疏两次获胜：从更高效的培训中获得更好的鲁棒性概括

Sparsity Winning Twice: Better Robust Generalization from More Efficient Training

论文作者

Chen, Tianlong, Zhang, Zhenyu, Wang, Pengjun, Balachandra, Santosh, Ma, Haoyu, Wang, Zehao, Wang, Zhangyang

论文摘要

最近的研究表明，除了先进的对抗性培训（AT），深层网络甚至与标准培训相比，除了昂贵得多的培训成本外，还具有较大的稳定概括差距。在本文中，我们从新的角度研究了这个有趣的问题，即在对抗训练期间注入适当的稀疏形式。我们介绍了两种稀疏对抗训练的替代方法：（i）静态稀疏性，利用彩票假设的最新结果来识别早期培训引起的关键稀疏子网；（ii）动态稀疏性，通过允许稀疏的子网络在整个训练过程中适应其连接模式（同时粘贴相同的稀疏性比）。我们发现静态和动态稀疏方法可以产生双赢：大大缩小了稳健的概括差距，并减轻了强大的过度拟合，同时大大节省了训练和推理障碍。广泛的实验通过在包括CIFAR-10/100和Tiny-Imagenet在内的多个网络架构上验证了我们的建议。例如，我们的方法将强大的泛化差距和过度拟合量减少34.44％和4.02％，具有可比的鲁棒/标准精度提升，以及87.83％/87.82％的培训/推理/推理拖放，可在CIFAR-100上节省Resnet-18。此外，我们的方法可以与现有的正规机构有机结合，从而建立新的最先进的结果。代码可在https://github.com/vita-group/sparsity-win---gust-generalization中找到。

Recent studies demonstrate that deep networks, even robustified by the state-of-the-art adversarial training (AT), still suffer from large robust generalization gaps, in addition to the much more expensive training costs than standard training. In this paper, we investigate this intriguing problem from a new perspective, i.e., injecting appropriate forms of sparsity during adversarial training. We introduce two alternatives for sparse adversarial training: (i) static sparsity, by leveraging recent results from the lottery ticket hypothesis to identify critical sparse subnetworks arising from the early training; (ii) dynamic sparsity, by allowing the sparse subnetwork to adaptively adjust its connectivity pattern (while sticking to the same sparsity ratio) throughout training. We find both static and dynamic sparse methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting, meanwhile significantly saving training and inference FLOPs. Extensive experiments validate our proposals with multiple network architectures on diverse datasets, including CIFAR-10/100 and Tiny-ImageNet. For example, our methods reduce robust generalization gap and overfitting by 34.44% and 4.02%, with comparable robust/standard accuracy boosts and 87.83%/87.82% training/inference FLOPs savings on CIFAR-100 with ResNet-18. Besides, our approaches can be organically combined with existing regularizers, establishing new state-of-the-art results in AT. Codes are available in https://github.com/VITA-Group/Sparsity-Win-Robust-Generalization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题