论文标题
通过功能重建在变压器上的本地学习
Local Learning on Transformers via Feature Reconstruction
论文作者
论文摘要
变压器由于其优于常规卷积神经网络(CNN)的卓越性能而变得越来越流行。但是,变压器通常需要比CNN大得多的内存训练,这阻止了它们在许多低资源设置中的应用。本地学习将网络分为几个不同的模块并单独训练它们,是端到端(E2E)训练方法的有前途替代方法,可减少训练的内存量并增加并行性。本文是第一个为此目的将本地学习应用于变形金刚的人。基于标准的CNN的本地学习方法InfoPro [32],重建了CNN中每个模块的输入图像。但是,重建整个图像并不能很好地概括。在本文中,我们为每个本地模块提出了一个新的机制,其中没有重建整个图像,而是重建了从前模块生成的其输入特征。我们在4个常用数据集和3个常用的解码器结构上评估了我们的方法。该实验表明,我们的方法优于Infopro转换器,即带有转载主链的Infopro,在CIFAR-10,CIFAR-100,STL-10,STL-10和SVHN数据集上,最高可达0.58%,同时消耗多达12%的记忆力。与E2E方法相比,当网络分为2个模块时,我们需要减少36%的GPU内存,而当网络分为4个模块时,GPU存储器少45%。
Transformers are becoming increasingly popular due to their superior performance over conventional convolutional neural networks(CNNs). However, transformers usually require a much larger amount of memory to train than CNNs, which prevents their application in many low resource settings. Local learning, which divides the network into several distinct modules and trains them individually, is a promising alternative to the end-to-end (E2E) training approach to reduce the amount of memory for training and to increase parallelism. This paper is the first to apply Local Learning on transformers for this purpose. The standard CNN-based local learning method, InfoPro [32], reconstructs the input images for each module in a CNN. However, reconstructing the entire image does not generalize well. In this paper, we propose a new mechanism for each local module, where instead of reconstructing the entire image, we reconstruct its input features, generated from previous modules. We evaluate our approach on 4 commonly used datasets and 3 commonly used decoder structures on Swin-Tiny. The experiments show that our approach outperforms InfoPro-Transformer, the InfoPro with Transfomer backbone we introduced, by at up to 0.58% on CIFAR-10, CIFAR-100, STL-10 and SVHN datasets, while using up to 12% less memory. Compared to the E2E approach, we require 36% less GPU memory when the network is divided into 2 modules and 45% less GPU memory when the network is divided into 4 modules.