论文标题
Autobalance:不平衡数据的优化损失功能
AutoBalance: Optimized Loss Functions for Imbalanced Data
论文作者
论文摘要
在现代机器学习问题中,数据集不平衡是司空见惯的。代表性不足的类别或具有敏感属性的组的存在导致人们对概括和公平性的担忧。大容量深网可以完全适合训练数据,并在训练过程中获得完美的准确性和公平性,但在测试过程中表现较差,这一事实进一步加剧了此类问题。为了应对这些挑战,我们提出了AutoBalance,这是一个双层优化框架,该框架自动设计训练损失功能,以优化准确性和寻求公平的目标。具体而言,下层问题会训练模型权重,而高级问题通过监视和优化验证数据的所需目标来调整损耗函数。我们的损失设计通过采用参数跨凝结损失和个性化数据增强方案来为课程/组的个性化治疗。我们评估了方法对不平衡和群体敏感分类的应用方案的好处和性能。广泛的经验评估证明了自动释放的好处,而不是最先进的方法。我们的实验发现与损失功能设计的理论见解以及火车验证拆分的好处相辅相成。所有代码均可用开源。
Imbalanced datasets are commonplace in modern machine learning problems. The presence of under-represented classes or groups with sensitive attributes results in concerns about generalization and fairness. Such concerns are further exacerbated by the fact that large capacity deep nets can perfectly fit the training data and appear to achieve perfect accuracy and fairness during training, but perform poorly during test. To address these challenges, we propose AutoBalance, a bi-level optimization framework that automatically designs a training loss function to optimize a blend of accuracy and fairness-seeking objectives. Specifically, a lower-level problem trains the model weights, and an upper-level problem tunes the loss function by monitoring and optimizing the desired objective over the validation data. Our loss design enables personalized treatment for classes/groups by employing a parametric cross-entropy loss and individualized data augmentation schemes. We evaluate the benefits and performance of our approach for the application scenarios of imbalanced and group-sensitive classification. Extensive empirical evaluations demonstrate the benefits of AutoBalance over state-of-the-art approaches. Our experimental findings are complemented with theoretical insights on loss function design and the benefits of train-validation split. All code is available open-source.