比例尺形式：从尺度的角度来重新访问基于变压器的骨干的医疗图像分段

论文标题

比例尺形式：从尺度的角度来重新访问基于变压器的骨干的医疗图像分段

ScaleFormer: Revisiting the Transformer-based Backbones from a Scale-wise Perspective for Medical Image Segmentation

论文作者

Huang, Huimin, Xie1, Shiao, Lin, Lanfen, Iwamoto, Yutaro, Han, Xianhua, Chen, Yen-Wei, Tong, Ruofeng

论文摘要

最近，已经开发了各种视觉变压器作为对远程依赖性建模的能力。在目前的基于变压器的主机以用于医疗图像分段，卷积层被纯变压器替换，或者将变压器添加到最深的编码器中以学习全局上下文。但是，从规模的角度来看，主要有两个挑战：（1）尺度内问题：在每个尺度中提取局部全球线索所缺乏的现有方法，这可能会影响小物体的信号传播；（2）尺度间问题：现有方法未能从多个量表中探索独特的信息，这可能会妨碍表示尺寸，形状和位置广泛的对象的表示形式学习。为了解决这些限制，我们提出了一个新颖的骨干，即比例尺形式，具有两个吸引人的设计：（1）尺度上的尺度内变压器旨在将基于CNN的本地功能与每个尺度的基于变压器的全球提示搭配，在该量表中，行和列的全局依赖性可以用轻度的全球依赖性来提取轻度dual-Weight dual-Xaxis MSA。（2）一种简单有效的空间感知尺度之间的变压器旨在以多个尺度之间的共识区域相互作用，该区域可以突出跨尺度依赖性并解决复杂的量表变化。不同基准测试的实验结果表明，我们的尺度形式的表现优于当前的最新方法。该代码可公开可用：https：//github.com/zjugivelab/scaleformer。

Recently, a variety of vision transformers have been developed as their capability of modeling long-range dependency. In current transformer-based backbones for medical image segmentation, convolutional layers were replaced with pure transformers, or transformers were added to the deepest encoder to learn global context. However, there are mainly two challenges in a scale-wise perspective: (1) intra-scale problem: the existing methods lacked in extracting local-global cues in each scale, which may impact the signal propagation of small objects; (2) inter-scale problem: the existing methods failed to explore distinctive information from multiple scales, which may hinder the representation learning from objects with widely variable size, shape and location. To address these limitations, we propose a novel backbone, namely ScaleFormer, with two appealing designs: (1) A scale-wise intra-scale transformer is designed to couple the CNN-based local features with the transformer-based global cues in each scale, where the row-wise and column-wise global dependencies can be extracted by a lightweight Dual-Axis MSA. (2) A simple and effective spatial-aware inter-scale transformer is designed to interact among consensual regions in multiple scales, which can highlight the cross-scale dependency and resolve the complex scale variations. Experimental results on different benchmarks demonstrate that our Scale-Former outperforms the current state-of-the-art methods. The code is publicly available at: https://github.com/ZJUGiveLab/ScaleFormer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题