最差的表现强大的蒸馏

论文标题

最差的表现强大的蒸馏

Robust Distillation for Worst-class Performance

论文作者

Wang, Serena, Narasimhan, Harikrishna, Zhou, Yichen, Hooker, Sara, Lukasik, Michal, Menon, Aditya Krishna

论文摘要

事实证明，知识蒸馏是使用教师模型的预测来改善学生模型的一种有效技术。但是，最近的工作表明，在数据中的亚组中，平均效率的提高并不统一，尤其是在稀有亚组和类别上的准确性通常可能以准确性为代价。为了在可能遵循长尾分配的课程中保持强劲的表现，我们开发了蒸馏技术，这些技术是为了改善学生最差的班级表现而量身定制的。具体来说，我们为教师和学生介绍了不同组合的强大优化目标，并在整体准确性和强大的最差目标目标之间进行任何权衡训练。我们从经验上表明，与其他基线方法相比，我们强大的蒸馏技术不仅取得了更好的最差级别性能，还可以提高帕累托的整体性能和最差级别性能之间的权衡改善。从理论上讲，我们提供有关在目标培训健壮学生时使一名好老师的见解。

Knowledge distillation has proven to be an effective technique in improving the performance a student model using predictions from a teacher model. However, recent work has shown that gains in average efficiency are not uniform across subgroups in the data, and in particular can often come at the cost of accuracy on rare subgroups and classes. To preserve strong performance across classes that may follow a long-tailed distribution, we develop distillation techniques that are tailored to improve the student's worst-class performance. Specifically, we introduce robust optimization objectives in different combinations for the teacher and student, and further allow for training with any tradeoff between the overall accuracy and the robust worst-class objective. We show empirically that our robust distillation techniques not only achieve better worst-class performance, but also lead to Pareto improvement in the tradeoff between overall performance and worst-class performance compared to other baseline methods. Theoretically, we provide insights into what makes a good teacher when the goal is to train a robust student.

下载PDF全文

下载文献需遵守相关版权规定

论文标题