视觉查询调整：有效使用中间表示，用于参数和内存有效传输学习

论文标题

视觉查询调整：有效使用中间表示，用于参数和内存有效传输学习

Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning

论文作者

Tu, Cheng-Hao, Mai, Zheda, Chao, Wei-Lun

论文摘要

预先训练模型的中间特征已显示出有关对下游任务进行准确预测的信息，即使模型骨干被冷冻。关键的挑战是如何使用这些中间功能，鉴于它们的巨大数量。我们提出了视觉查询调整（VQT），这是一种简单而有效的方法，用于汇总视觉变压器的中间特征。通过向每一层引入少数可学习的``Query'令牌，VQT都利用了变形金刚的内部工作原理``总结'''''''''''''总结''''富含中间特征，然后可以用来训练下游任务的预测头。与许多其他参数有效的微调方法相比，VQT保持中间功能完整并仅学习结合起来，它在训练方面具有记忆效率，这些方法学会了适应功能并需要在整个骨架中进行反向传播。这也表明了VQT与转移学习中那些方法之间的互补作用。从经验上讲，VQT始终超过了最新的方法，该方法利用了中间功能来转移学习，并且在许多情况下都超越了全面的微调。与适应特征的参数效率方法相比，VQT在记忆约束下的精度更高。最重要的是，VQT与这些方法兼容以达到更高的准确性，从而使其成为进一步增强转移学习的简单附加组件。

Intermediate features of a pre-trained model have been shown informative for making accurate predictions on downstream tasks, even if the model backbone is kept frozen. The key challenge is how to utilize these intermediate features given their gigantic amount. We propose visual query tuning (VQT), a simple yet effective approach to aggregate intermediate features of Vision Transformers. Through introducing a handful of learnable ``query'' tokens to each layer, VQT leverages the inner workings of Transformers to ``summarize'' rich intermediate features of each layer, which can then be used to train the prediction heads of downstream tasks. As VQT keeps the intermediate features intact and only learns to combine them, it enjoys memory efficiency in training, compared to many other parameter-efficient fine-tuning approaches that learn to adapt features and need back-propagation through the entire backbone. This also suggests the complementary role between VQT and those approaches in transfer learning. Empirically, VQT consistently surpasses the state-of-the-art approach that utilizes intermediate features for transfer learning and outperforms full fine-tuning in many cases. Compared to parameter-efficient approaches that adapt features, VQT achieves much higher accuracy under memory constraints. Most importantly, VQT is compatible with these approaches to attain even higher accuracy, making it a simple add-on to further boost transfer learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题