通过计算机视觉挑战评估基于深度学习的息肉检测和细分方法的普遍性

论文标题

通过计算机视觉挑战评估基于深度学习的息肉检测和细分方法的普遍性

Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge

论文作者

Ali, Sharib, Ghatwary, Noha, Jha, Debesh, Isik-Polat, Ece, Polat, Gorkem, Yang, Chen, Li, Wuyang, Galdran, Adrian, Ballester, Miguel-Ángel González, Thambawita, Vajira, Hicks, Steven, Poudel, Sahadev, Lee, Sang-Woong, Jin, Ziyi, Gan, Tianyuan, Yu, ChengHui, Yan, JiangPeng, Yeo, Doyeob, Lee, Hyunseok, Tomar, Nikhil Kumar, Haithmi, Mahmood, Ahmed, Amr, Riegler, Michael A., Daul, Christian, Halvorsen, Pål, Rittscher, Jens, Salem, Osama E., Lamarque, Dominique, Cannizzaro, Renato, Realdon, Stefano, de Lange, Thomas, East, James E.

论文摘要

息肉是通过结肠镜检查鉴定出的众所周知的癌症前体。但是，其大小，位置和表面的变异性在很大程度上影响识别，定位和表征。此外，结肠镜检查和去除息肉（称为息肉切除术）是高度依赖性的程序。由于其可变性质，划定异常，高复发率和结肠的解剖构图，因此存在很高的检测率和不完全清除结肠息肉的去除。实现自动化方法，使用机器学习来检测和分割这些息肉。但是，大多数这些方法中的主要缺点是它们能够将来自不同中心，模式和获取系统的样本外的未见数据集概括。为了严格测试这一假设，我们策划了一种从多个结肠镜检查系统中获得的多中心和多人群数据集，并挑战了包括机器学习专家的团队，以开发机器学习专家，作为我们众启动的内窥镜计算机视觉挑战（Endocv）的一部分，在本文中，我们分析了五个范围的五个顶部，我们分析了七个顶部（我们分析了四个顶部），我们分析了七个顶部（我们分析了四个顶部），我们分析了七个顶部（我们）的七个距离。团队（16个）。我们的分析表明，高级团队集中在准确性上（即，在不同验证集上的总骰子得分上的准确性> 80％> 80％），而不是临床适用性所需的实时性能。我们进一步剖析了方法，并提供了一个基于实验的假设，该假设揭示了需要提高可概括性解决多中心数据集中多样性的必要性。

Polyps are well-known cancer precursors identified by colonoscopy. However, variability in their size, location, and surface largely affect identification, localisation, and characterisation. Moreover, colonoscopic surveillance and removal of polyps (referred to as polypectomy ) are highly operator-dependent procedures. There exist a high missed detection rate and incomplete removal of colonic polyps due to their variable nature, the difficulties to delineate the abnormality, the high recurrence rates, and the anatomical topography of the colon. There have been several developments in realising automated methods for both detection and segmentation of these polyps using machine learning. However, the major drawback in most of these methods is their ability to generalise to out-of-sample unseen datasets that come from different centres, modalities and acquisition systems. To test this hypothesis rigorously we curated a multi-centre and multi-population dataset acquired from multiple colonoscopy systems and challenged teams comprising machine learning experts to develop robust automated detection and segmentation methods as part of our crowd-sourcing Endoscopic computer vision challenge (EndoCV) 2021. In this paper, we analyse the detection results of the four top (among seven) teams and the segmentation results of the five top teams (among 16). Our analyses demonstrate that the top-ranking teams concentrated on accuracy (i.e., accuracy > 80% on overall Dice score on different validation sets) over real-time performance required for clinical applicability. We further dissect the methods and provide an experiment-based hypothesis that reveals the need for improved generalisability to tackle diversity present in multi-centre datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题