论文标题
面向隐私的DNN修剪和移动加速框架
A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework
论文作者
论文摘要
已经提出了深度神经网络(DNN)的重量修剪,以满足移动边缘设备的有限存储和计算能力。但是,以前的修剪方法主要集中于减少模型大小和/或改善性能,而无需考虑用户数据的隐私。为了减轻这种关注,我们建议不需要私人培训数据集的面向隐私的修剪和移动加速框架。在提出的框架的算法级别上,基于乘数的交替方向方法(ADMM)的系统权重修剪技术的设计旨在迭代地通过随机生成的合成数据为每一层基于模式的修剪问题。此外,将相应的优化在编译器级别上被利用用于设备上的推理加速度。通过提出的框架,用户可以避免用于非专家的耗时的修剪过程,并直接从压缩模型中受益。实验结果表明,所提出的框架的表现优于三个最先进的端到端DNN框架,即TensorFlow-Lite,TVM和MNN,速度分别高达4.2倍,2.5倍和2.0倍,几乎没有准确性损失,同时保留了数据私有。
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices. However, previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data. To mitigate this concern, we propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset. At the algorithm level of the proposed framework, a systematic weight pruning technique based on the alternating direction method of multipliers (ADMM) is designed to iteratively solve the pattern-based pruning problem for each layer with randomly generated synthetic data. In addition, corresponding optimizations at the compiler level are leveraged for inference accelerations on devices. With the proposed framework, users could avoid the time-consuming pruning process for non-experts and directly benefit from compressed models. Experimental results show that the proposed framework outperforms three state-of-art end-to-end DNN frameworks, i.e., TensorFlow-Lite, TVM, and MNN, with speedup up to 4.2X, 2.5X, and 2.0X, respectively, with almost no accuracy loss, while preserving data privacy.