Vitbis：生物医学图像分割的视觉变压器

论文标题

Vitbis：生物医学图像分割的视觉变压器

ViTBIS: Vision Transformer for Biomedical Image Segmentation

论文作者

Sagar, Abhinav

论文摘要

在本文中，我们提出了一个新型网络，称为生物医学图像分割（VITBIS）的Vision Transformer。我们的网络将输入功能映射分为三个部分，其中$ 1 \ times 1 $，$ 3 \ times 3 $和$ 5 \ times 5 $ 5 $在编码器和解码器中。 Concat Operator用于合并功能，然后再将其嵌入其中的注意机制馈送到三个连续的变压器块中。跳过连接用于连接编码器和解码器变压器块。类似地，在线性投影以产生输出分割图之前，在解码器中使用变压器块和多刻度体系结构。我们使用突触多器官分段数据集，自动心脏诊断挑战数据集，脑肿瘤MRI分段数据集和脾脏CT分段数据集测试网络的性能。没有铃铛和哨声，我们的网络的表现优于先前的最先前的CNN和基于变压器的模型，并使用骰子得分和Hausdorff距离作为评估指标。

In this paper, we propose a novel network named Vision Transformer for Biomedical Image Segmentation (ViTBIS). Our network splits the input feature maps into three parts with $1\times 1$, $3\times 3$ and $5\times 5$ convolutions in both encoder and decoder. Concat operator is used to merge the features before being fed to three consecutive transformer blocks with attention mechanism embedded inside it. Skip connections are used to connect encoder and decoder transformer blocks. Similarly, transformer blocks and multi scale architecture is used in decoder before being linearly projected to produce the output segmentation map. We test the performance of our network using Synapse multi-organ segmentation dataset, Automated cardiac diagnosis challenge dataset, Brain tumour MRI segmentation dataset and Spleen CT segmentation dataset. Without bells and whistles, our network outperforms most of the previous state of the art CNN and transformer based models using Dice score and the Hausdorff distance as the evaluation metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题