论文标题
标签偏移下的高维二进制分类:相变和正则化
High Dimensional Binary Classification under Label Shift: Phase Transition and Regularization
论文作者
论文摘要
人们普遍认为,标签转移对机器学习模型的概括性能有害。研究人员提出了许多减轻标签转移影响的方法,例如平衡培训数据。但是,这些方法经常考虑参数不足的制度,其中样本量远大于数据维度。在过度兼容政权下的研究非常有限。为了弥合这一差距,我们提出了对Fisher线性判别分类器进行新的渐近分析,用于二元分类,并使用标签移位。具体而言,我们证明存在一种相变现象:在某些过份术的制度下,使用不平衡数据训练的分类器优于降低平衡数据的分类器。此外,我们研究了正则化对标签转移的影响:随着正则化的强大,上述相位过渡会消失。
Label Shift has been widely believed to be harmful to the generalization performance of machine learning models. Researchers have proposed many approaches to mitigate the impact of the label shift, e.g., balancing the training data. However, these methods often consider the underparametrized regime, where the sample size is much larger than the data dimension. The research under the overparametrized regime is very limited. To bridge this gap, we propose a new asymptotic analysis of the Fisher Linear Discriminant classifier for binary classification with label shift. Specifically, we prove that there exists a phase transition phenomenon: Under certain overparametrized regime, the classifier trained using imbalanced data outperforms the counterpart with reduced balanced data. Moreover, we investigate the impact of regularization to the label shift: The aforementioned phase transition vanishes as the regularization becomes strong.