通过针对特征的模型修剪来探索最佳的子结构，以进行分发概括

论文标题

通过针对特征的模型修剪来探索最佳的子结构，以进行分发概括

Exploring Optimal Substructure for Out-of-distribution Generalization via Feature-targeted Model Pruning

论文作者

Wang, Yingchun, Guo, Jingcai, Guo, Song, Zhang, Weizhan, Zhang, Jie

论文摘要

最近的研究表明，即使是高度偏见的密度网络，也包含一个公正的子结构，比原始模型可以实现更好的分布（OOD）概括。现有作品通常使用模块化风险最小化（MRM）搜索不域数据的不变子网。这样的范式可能会带来两个潜在的弱点：1）由于训练过程中对外域数据的观察不足，因此不公平； 2）由于对整个数据分布的特征模型修剪，因此，亚最佳OOD概括。在本文中，我们提出了一个新型的以伪造特征的模型修剪框架（称为SFP），以自动探索不变的子结构而无需提及上述弱点。具体而言，SFP使用我们的理论验证的任务损失在训练过程中识别分布（ID）功能，SFP可以在其上执行ID目标模型修剪，从而消除具有强大依赖性ID特征的分支。值得注意的是，通过将虚假特征的投影减弱到模型空间中，SFP可以将模型学习推向不变特征，并将其从环境特征中脱出，从而设计出最佳的OOD概括。此外，我们还进行了详细的理论分析，以通过模型稀疏性为OOD结构提供合理性保证和证明框架，并首次揭示高度偏见的数据分布如何影响模型的OOD概括。各种OOD数据集的广泛实验表明，SFP可以显着胜过基于结构的和非结构的OOD OOD SOTA，精度提高了高达4.72％和23.35％。

Recent studies show that even highly biased dense networks contain an unbiased substructure that can achieve better out-of-distribution (OOD) generalization than the original model. Existing works usually search the invariant subnetwork using modular risk minimization (MRM) with out-domain data. Such a paradigm may bring about two potential weaknesses: 1) Unfairness, due to the insufficient observation of out-domain data during training; and 2) Sub-optimal OOD generalization, due to the feature-untargeted model pruning on the whole data distribution. In this paper, we propose a novel Spurious Feature-targeted model Pruning framework, dubbed SFP, to automatically explore invariant substructures without referring to the above weaknesses. Specifically, SFP identifies in-distribution (ID) features during training using our theoretically verified task loss, upon which, SFP can perform ID targeted-model pruning that removes branches with strong dependencies on ID features. Notably, by attenuating the projections of spurious features into model space, SFP can push the model learning toward invariant features and pull that out of environmental features, devising optimal OOD generalization. Moreover, we also conduct detailed theoretical analysis to provide the rationality guarantee and a proof framework for OOD structures via model sparsity, and for the first time, reveal how a highly biased data distribution affects the model's OOD generalization. Extensive experiments on various OOD datasets show that SFP can significantly outperform both structure-based and non-structure OOD generalization SOTAs, with accuracy improvement up to 4.72% and 23.35%, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题