大规模生成无数据蒸馏

论文标题

大规模生成无数据蒸馏

Large-Scale Generative Data-Free Distillation

论文作者

Luo, Liangchen, Sandler, Mark, Lin, Zi, Zhmoginov, Andrey, Howard, Andrew

论文摘要

知识蒸馏是知识转移，模型压缩和半监督学习的最流行和有效技术之一。大多数现有的蒸馏方法都需要访问原始或增强的培训样品。但是，由于隐私，专有和可用性问题，实际上可能是有问题的。最近的工作提出了一些解决此问题的方法，但是它们要么很耗时，要么无法扩展到大型数据集。为此，我们提出了一种新方法，通过利用训练有素的教师网络的固有标准化层的统计数据来训练生成图像模型。这使我们能够在没有训练数据的情况下建立一个发电机的集合，这些数据可以有效地产生替代输入以进行随后的蒸馏。所提出的方法将CIFAR-10和CIFAR-100上的无数据蒸馏性能分别推向了95.02％和77.02％。此外，我们能够将其扩展到Imagenet数据集，据我们所知，它从未在无数据设置中使用生成模型进行。

Knowledge distillation is one of the most popular and effective techniques for knowledge transfer, model compression and semi-supervised learning. Most existing distillation approaches require the access to original or augmented training samples. But this can be problematic in practice due to privacy, proprietary and availability concerns. Recent work has put forward some methods to tackle this problem, but they are either highly time-consuming or unable to scale to large datasets. To this end, we propose a new method to train a generative image model by leveraging the intrinsic normalization layers' statistics of the trained teacher network. This enables us to build an ensemble of generators without training data that can efficiently produce substitute inputs for subsequent distillation. The proposed method pushes forward the data-free distillation performance on CIFAR-10 and CIFAR-100 to 95.02% and 77.02% respectively. Furthermore, we are able to scale it to ImageNet dataset, which to the best of our knowledge, has never been done using generative models in a data-free setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题