论文标题
3D UX-NET:一个大的内核体积交流,现代化的层次变压器用于医学图像分割
3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation
论文作者
论文摘要
最近的3D Medical Vit(例如Swinunetr)在几个3D体积数据基准(包括3D医疗图像分割)上实现了最新的性能。分层变压器(例如Swin Transformers)重新引入了几个Convnet先验,并进一步增强了3D Medical数据集中适应体积分割的实际生存能力。混合方法的有效性在很大程度上被认为是非本地自我注意力和大量模型参数的大型接受场。在这项工作中,我们提出了一个称为3D UX-NET的轻质体积弯曲,该频率为3D UX-NET,该频线使用Convnet模块适应了层次变压器,以实现强大的体积分割。具体来说,我们重新访问具有较大内核大小的体积深度卷积(例如,从$ 7 \ times7 \ times7 $开始),以启用更大的全球接收场,灵感来自Swin Transformer。我们进一步将多层感知器(MLP)替换为具有深度卷积的SWIN变压器块,并以更少的归一化和激活层增强模型性能,从而减少了模型参数的数量。 3D UX-NET使用有关体积大脑和腹部成像的三个具有挑战性的公共数据集与当前的SOTA变压器(例如Swinunetr)竞争:1)Miccai Challenge 2021 Flare,2)Miccai Challenge 2021 FETA和3)MICCAI挑战2022 AMOS挑战2022 AMOS。 3D UX-NET始终优于Swinunetr,其改善从0.929提高到0.938骰子(Flare2021)和0.867至0.874骰子(Feta2021)。我们进一步评估了AMOS2022的3D UX-NET的转移学习能力,并证明了$ 2.27 \%$骰子(从0.880到0.900)的另一种提高。带有我们建议模型的源代码可在https://github.com/masilab/3dux-net上找到。
The recent 3D medical ViTs (e.g., SwinUNETR) achieve the state-of-the-art performances on several 3D volumetric data benchmarks, including 3D medical image segmentation. Hierarchical transformers (e.g., Swin Transformers) reintroduced several ConvNet priors and further enhanced the practical viability of adapting volumetric segmentation in 3D medical datasets. The effectiveness of hybrid approaches is largely credited to the large receptive field for non-local self-attention and the large number of model parameters. In this work, we propose a lightweight volumetric ConvNet, termed 3D UX-Net, which adapts the hierarchical transformer using ConvNet modules for robust volumetric segmentation. Specifically, we revisit volumetric depth-wise convolutions with large kernel size (e.g. starting from $7\times7\times7$) to enable the larger global receptive fields, inspired by Swin Transformer. We further substitute the multi-layer perceptron (MLP) in Swin Transformer blocks with pointwise depth convolutions and enhance model performances with fewer normalization and activation layers, thus reducing the number of model parameters. 3D UX-Net competes favorably with current SOTA transformers (e.g. SwinUNETR) using three challenging public datasets on volumetric brain and abdominal imaging: 1) MICCAI Challenge 2021 FLARE, 2) MICCAI Challenge 2021 FeTA, and 3) MICCAI Challenge 2022 AMOS. 3D UX-Net consistently outperforms SwinUNETR with improvement from 0.929 to 0.938 Dice (FLARE2021) and 0.867 to 0.874 Dice (Feta2021). We further evaluate the transfer learning capability of 3D UX-Net with AMOS2022 and demonstrates another improvement of $2.27\%$ Dice (from 0.880 to 0.900). The source code with our proposed model are available at https://github.com/MASILab/3DUX-Net.