班级感知的对抗变压器用于医学图像分段

论文标题

班级感知的对抗变压器用于医学图像分段

Class-Aware Adversarial Transformers for Medical Image Segmentation

论文作者

You, Chenyu, Zhao, Ruihan, Liu, Fenglin, Dong, Siyuan, Chinchali, Sandeep, Topcu, Ufuk, Staib, Lawrence, Duncan, James S.

论文摘要

变形金刚在建模医学图像分析域内的长期依赖性方面取得了显着的进步。但是，当前基于变压器的模型患有几个缺点：（1）现有方法无法捕获由于天真的令牌化方案而导致图像的重要特征；（2）模型遭受信息丢失的损失，因为它们仅考虑单尺度特征表示；（3）如果不考虑丰富的语义上下文和解剖纹理，模型生成的分割标签图不够准确。在这项工作中，我们介绍了一种新型的对抗变压器，用于2D医疗图像分割。首先，我们利用金字塔结构来构建多尺度表示并处理多尺度变化。然后，我们设计了一种新颖的类感知变压器模块，以更好地学习具有语义结构的对象的判别区域。最后，我们采用了一种对抗性训练策略来提高分割精度，并相应地允许基于变压器的判别器捕获高级语义相关的内容和低级解剖学特征。我们的实验表明，在三种基准上，Castformer的表现明显优于先前的基于最先进的变压器方法，从而获得了2.54％-5.88％的绝对改善，而不是先前的模型。进一步的定性实验提供了模型内部运作方式的更详细图片，阐明了提高透明度的挑战，并证明了转移学习可以大大提高培训中的绩效并减少医疗图像数据集的大小，从而使CastFormer成为下游医疗图像分析任务的强大起点。

Transformers have made remarkable progress towards modeling long-range dependencies within the medical image analysis domain. However, current transformer-based models suffer from several disadvantages: (1) existing methods fail to capture the important features of the images due to the naive tokenization scheme; (2) the models suffer from information loss because they only consider single-scale feature representations; and (3) the segmentation label maps generated by the models are not accurate enough without considering rich semantic contexts and anatomical textures. In this work, we present CASTformer, a novel type of adversarial transformers, for 2D medical image segmentation. First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations. We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures. Lastly, we utilize an adversarial training strategy that boosts segmentation accuracy and correspondingly allows a transformer-based discriminator to capture high-level semantically correlated contents and low-level anatomical features. Our experiments demonstrate that CASTformer dramatically outperforms previous state-of-the-art transformer-based approaches on three benchmarks, obtaining 2.54%-5.88% absolute improvements in Dice over previous models. Further qualitative experiments provide a more detailed picture of the model's inner workings, shed light on the challenges in improved transparency, and demonstrate that transfer learning can greatly improve performance and reduce the size of medical image datasets in training, making CASTformer a strong starting point for downstream medical image analysis tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题