论文标题
语义感知消息广播,以进行有效的无监督域适应
Semantic-aware Message Broadcasting for Efficient Unsupervised Domain Adaptation
论文作者
论文摘要
Vision Transformer在丰富的视觉任务中表现出巨大的潜力。但是,当测试中分布变化(即,分布数据)时,它也不可避免地遭受了概括能力差。为了减轻此问题,我们提出了一种新颖的方法,即语义吸引的消息广播(SAMB),该方法可以使无监督的域适应性(UDA)更具信息和灵活的功能对齐。特别是,我们研究了视觉变压器中的注意力模块,并注意到使用一个全球类代币的对齐空间缺乏足够的灵活性,在该空间中,它以相同的方式与所有图像令牌进行交互,但忽略了不同区域的丰富语义。在本文中,我们旨在通过启用语义意识的自适应消息广播来提高对齐功能的丰富性。特别是,我们介绍了一组学识渊博的令牌作为节点,以从所有图像代币中汇总全球信息,但鼓励不同的组令牌适应地关注对不同语义区域的消息广播。这样,我们的信息广播鼓励集团令牌学习更多信息和多样的信息,以进行有效的域名。此外,我们系统地研究了基于对抗性的特征比对(ADA)和基于伪标签的自我训练(PST)对UDA的影响。我们发现,通过ADA和PST的合作,一种简单的两阶段训练策略可以进一步提高视觉变压器的适应能力。在域,OfficeHome和Visda-2017上进行了广泛的实验,证明了我们对UDA方法的有效性。
Vision transformer has demonstrated great potential in abundant vision tasks. However, it also inevitably suffers from poor generalization capability when the distribution shift occurs in testing (i.e., out-of-distribution data). To mitigate this issue, we propose a novel method, Semantic-aware Message Broadcasting (SAMB), which enables more informative and flexible feature alignment for unsupervised domain adaptation (UDA). Particularly, we study the attention module in the vision transformer and notice that the alignment space using one global class token lacks enough flexibility, where it interacts information with all image tokens in the same manner but ignores the rich semantics of different regions. In this paper, we aim to improve the richness of the alignment features by enabling semantic-aware adaptive message broadcasting. Particularly, we introduce a group of learned group tokens as nodes to aggregate the global information from all image tokens, but encourage different group tokens to adaptively focus on the message broadcasting to different semantic regions. In this way, our message broadcasting encourages the group tokens to learn more informative and diverse information for effective domain alignment. Moreover, we systematically study the effects of adversarial-based feature alignment (ADA) and pseudo-label based self-training (PST) on UDA. We find that one simple two-stage training strategy with the cooperation of ADA and PST can further improve the adaptation capability of the vision transformer. Extensive experiments on DomainNet, OfficeHome, and VisDA-2017 demonstrate the effectiveness of our methods for UDA.