论文标题

GPVIT:高分辨率的非等级视觉变压器,带有群体传播

GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

论文作者

Yang, Chenhongyi, Xu, Jiarui, De Mello, Shalini, Crowley, Elliot J., Wang, Xiaolong

论文摘要

我们介绍了群体传播视觉变压器(GPVIT):一种新型的非等级(即非锥体)变压器模型,设计用于具有高分辨率特征的一般视觉识别。高分辨率功能(或令牌)是对任务的自然拟合,涉及感知细粒细节(例如检测和分割),但是由于自我发场量表的方式,在记忆和计算中交换全局信息在内存和计算中很昂贵。我们提供一个高效的替代组传播块(GP块)来交换全局信息。在每个GP块中,首先将功能由固定数量的可学习组令牌组合在一起;然后,我们执行小组传播,其中分组功能之间交换全局信息;最后,更新的分组功能中的全局信息通过变压器解码器返回到图像功能。我们在各种视觉识别任务上评估GPVIT,包括图像分类,语义分割,对象检测和实例分割。我们的方法在所有任务中都可以实现以前的工作的显着性能,尤其是在需要高分辨率输出的任务上,例如,在ADE20K语义细分中,我们的GPVIT-L3优于Swin Transformer-B,只有2.0 miou,只有一半的参数。项目页面:chenhongyiyang.com/projects/gpvit/gpvit

We present the Group Propagation Vision Transformer (GPViT): a novel nonhierarchical (i.e. non-pyramidal) transformer model designed for general visual recognition with high-resolution features. High-resolution features (or tokens) are a natural fit for tasks that involve perceiving fine-grained details such as detection and segmentation, but exchanging global information between these features is expensive in memory and computation because of the way self-attention scales. We provide a highly efficient alternative Group Propagation Block (GP Block) to exchange global information. In each GP Block, features are first grouped together by a fixed number of learnable group tokens; we then perform Group Propagation where global information is exchanged between the grouped features; finally, global information in the updated grouped features is returned back to the image features through a transformer decoder. We evaluate GPViT on a variety of visual recognition tasks including image classification, semantic segmentation, object detection, and instance segmentation. Our method achieves significant performance gains over previous works across all tasks, especially on tasks that require highresolution outputs, for example, our GPViT-L3 outperforms Swin Transformer-B by 2.0 mIoU on ADE20K semantic segmentation with only half as many parameters. Project page: chenhongyiyang.com/projects/GPViT/GPViT

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源