2D医疗图像细分的自我监督预处理

论文标题

2D医疗图像细分的自我监督预处理

Self-Supervised Pretraining for 2D Medical Image Segmentation

论文作者

Kalapos, András, Gyires-Tóth, Bálint

论文摘要

监督的机器学习为各种计算机视觉问题提供了最新的解决方案。但是，对大量标记的培训数据的需求限制了这些算法在稀缺或昂贵的情况下这些算法的能力。自学学习提供了一种方法，可以通过对未标记数据的特定域进行预处理模型来降低对手动注释数据的需求。在这种方法中，标记的数据完全需要用于微调下游任务的模型。医疗图像细分是标签数据需要专家知识并收集大型标记数据集的领域。因此，自我监督的学习算法有望在这一领域进行实质性改进。尽管如此，自我监督的学习算法很少被用于预识医学图像分割网络。在本文中，我们详细阐述并分析了对下游医学图像分割的监督和自我监督预处理方法的有效性，重点是收敛和数据效率。我们发现，在自然图像和目标域特异性图像上进行自我保护的预测会导致最快，最稳定的下游收敛性。在我们对ACDC心脏分割数据集的实验中，与ImageNet预验证的模型相比，这种预训练的方法可实现4-5倍的微调收敛。我们还表明，这种方法需要在特定于域的数据上进行少于五个时期的预处理，以在下游收敛时间改善。最后，我们发现，在低数据局场景中，监督的Imagenet预处理达到了最佳准确性，需要少于100个注释的样本才能实现接近最小的错误。

Supervised machine learning provides state-of-the-art solutions to a wide range of computer vision problems. However, the need for copious labelled training data limits the capabilities of these algorithms in scenarios where such input is scarce or expensive. Self-supervised learning offers a way to lower the need for manually annotated data by pretraining models for a specific domain on unlabelled data. In this approach, labelled data are solely required to fine-tune models for downstream tasks. Medical image segmentation is a field where labelling data requires expert knowledge and collecting large labelled datasets is challenging; therefore, self-supervised learning algorithms promise substantial improvements in this field. Despite this, self-supervised learning algorithms are used rarely to pretrain medical image segmentation networks. In this paper, we elaborate and analyse the effectiveness of supervised and self-supervised pretraining approaches on downstream medical image segmentation, focusing on convergence and data efficiency. We find that self-supervised pretraining on natural images and target-domain-specific images leads to the fastest and most stable downstream convergence. In our experiments on the ACDC cardiac segmentation dataset, this pretraining approach achieves 4-5 times faster fine-tuning convergence compared to an ImageNet pretrained model. We also show that this approach requires less than five epochs of pretraining on domain-specific data to achieve such improvement in the downstream convergence time. Finally, we find that, in low-data scenarios, supervised ImageNet pretraining achieves the best accuracy, requiring less than 100 annotated samples to realise close to minimal error.

下载PDF全文

下载文献需遵守相关版权规定

论文标题