论文标题
元关注vit支持的持续学习
Meta-attention for ViT-backed Continual Learning
论文作者
论文摘要
持续学习是一个长期的研究主题,因为它在解决不断到达的任务中的关键作用。到目前为止,计算机视觉中持续学习的研究主要仅限于卷积神经网络(CNN)。但是,最近有一种趋势是,新兴的视觉变压器(VIT)逐渐主导了计算机视觉领域,如果直接应用于VIT,则会使基于CNN的持续学习落后,因为它们可能会遭受严重的性能降解。在本文中,我们研究了VIT支持的持续学习,以努力在最近的VIT进展中努力提高性能。受到CNN中基于面具的持续学习方法的启发,在该方法中,我们提出了将预培训的VIT适应新任务的每个任务,我们提出了元注意(肉),即注意自我注意事项,以使预训练的VIT适应新任务,而无需在已经学习过的任务上牺牲绩效。与先前基于掩模的方法(如Piggyback)不同,所有参数都与相应的面膜相关联,肉类利用VIT的特征,并且仅掩盖其一部分参数。它使肉更加高效,开销较小,准确性更高。广泛的实验表明,肉比其最先进的CNN对应物具有显着优越性,精度的绝对增强为4.0〜6.0%。我们的代码已在https://github.com/zju-vipa/meat-til上发布。
Continual learning is a longstanding research topic due to its crucial role in tackling continually arriving tasks. Up to now, the study of continual learning in computer vision is mainly restricted to convolutional neural networks (CNNs). However, recently there is a tendency that the newly emerging vision transformers (ViTs) are gradually dominating the field of computer vision, which leaves CNN-based continual learning lagging behind as they can suffer from severe performance degradation if straightforwardly applied to ViTs. In this paper, we study ViT-backed continual learning to strive for higher performance riding on recent advances of ViTs. Inspired by mask-based continual learning methods in CNNs, where a mask is learned per task to adapt the pre-trained ViT to the new task, we propose MEta-ATtention (MEAT), i.e., attention to self-attention, to adapt a pre-trained ViT to new tasks without sacrificing performance on already learned tasks. Unlike prior mask-based methods like Piggyback, where all parameters are associated with corresponding masks, MEAT leverages the characteristics of ViTs and only masks a portion of its parameters. It renders MEAT more efficient and effective with less overhead and higher accuracy. Extensive experiments demonstrate that MEAT exhibits significant superiority to its state-of-the-art CNN counterparts, with 4.0~6.0% absolute boosts in accuracy. Our code has been released at https://github.com/zju-vipa/MEAT-TIL.