论文标题
BD-KD:平衡在线知识蒸馏的分歧
BD-KD: Balancing the Divergences for Online Knowledge Distillation
论文作者
论文摘要
我们应对为边缘设备生产可信赖和准确的紧凑型模型的挑战。尽管知识蒸馏(KD)在实现高精度性能方面改善了模型压缩,但这些紧凑模型的校准已被忽略。我们介绍BD-KD(平衡差异知识蒸馏),这是基于logit的在线KD的框架。 BD-KD同时增强了准确性和模型校准,从而消除了对事后重新校准技术的需求,这些技术将计算开销添加到整体训练管道和退化性能中。我们的方法通过调整针对学生和教师损失的传统在线蒸馏损失来鼓励以学生为中心的培训,并采用了往前和反向kullback-leibler差异的样本加权。该策略平衡了学生网络的信心并提高性能。与最近的在线KD方法相比,CIFAR10,CIFAR100,Tinyimagenet和ImageNet数据集的实验以及各种体系结构的校准和准确性都提高了。
We address the challenge of producing trustworthy and accurate compact models for edge devices. While Knowledge Distillation (KD) has improved model compression in terms of achieving high accuracy performance, calibration of these compact models has been overlooked. We introduce BD-KD (Balanced Divergence Knowledge Distillation), a framework for logit-based online KD. BD-KD enhances both accuracy and model calibration simultaneously, eliminating the need for post-hoc recalibration techniques, which add computational overhead to the overall training pipeline and degrade performance. Our method encourages student-centered training by adjusting the conventional online distillation loss on both the student and teacher losses, employing sample-wise weighting of forward and reverse Kullback-Leibler divergence. This strategy balances student network confidence and boosts performance. Experiments across CIFAR10, CIFAR100, TinyImageNet, and ImageNet datasets, and various architectures demonstrate improved calibration and accuracy compared to recent online KD methods.