论文标题
任意转移集的无数据知识蒸馏的有效性
Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge Distillation
论文作者
论文摘要
知识蒸馏是一种在深层神经网络中转移学习的有效方法。通常,最初用于培训的数据集选择教师模型作为“转移集”,以进行知识转移给学生。但是,由于隐私或敏感性问题,这种原始的培训数据可能并不总是可以自由使用。在这种情况下,现有方法要么迭代地构成了原始培训数据集的合成集,要么一次一个样本,要么学习一个生成模型来组成这种传输集。但是,这两种方法均涉及复杂的优化(GAN训练或几个循环传播步骤,以合成一个样本),并且通常在计算上昂贵。在本文中,作为一种简单的选择,我们研究了“任意转移集”的有效性,例如随机噪声,公开可用的合成和天然数据集,所有这些数据集都完全与原始培训数据集无关,以其视觉或语义内容而言。通过在多个基准数据集(例如MNIST,FMNIST,CIFAR-10和CIFAR-100)上进行的大量实验,我们发现并验证使用任意数据在此数据集为“目标类平衡”时进行知识蒸馏的令人惊讶的有效性。我们认为,这种重要的观察可能会导致为无数据知识蒸馏任务设计基线。
Knowledge Distillation is an effective method to transfer the learning across deep neural networks. Typically, the dataset originally used for training the Teacher model is chosen as the "Transfer Set" to conduct the knowledge transfer to the Student. However, this original training data may not always be freely available due to privacy or sensitivity concerns. In such scenarios, existing approaches either iteratively compose a synthetic set representative of the original training dataset, one sample at a time or learn a generative model to compose such a transfer set. However, both these approaches involve complex optimization (GAN training or several backpropagation steps to synthesize one sample) and are often computationally expensive. In this paper, as a simple alternative, we investigate the effectiveness of "arbitrary transfer sets" such as random noise, publicly available synthetic, and natural datasets, all of which are completely unrelated to the original training dataset in terms of their visual or semantic contents. Through extensive experiments on multiple benchmark datasets such as MNIST, FMNIST, CIFAR-10 and CIFAR-100, we discover and validate surprising effectiveness of using arbitrary data to conduct knowledge distillation when this dataset is "target-class balanced". We believe that this important observation can potentially lead to designing baselines for the data-free knowledge distillation task.