论文标题
腹部分割的异常指导优化
Outlier Guided Optimization of Abdominal Segmentation
论文作者
论文摘要
计算机断层扫描(CT)图像的腹部多器官分割一直是广泛研究的主题。它在医学图像处理中提出了重大挑战,因为随着时间的流逝,腹部器官的形状和分布可能在人群中和个体内部差异很大。尽管将新型数据集的连续集成到培训集中为更好的细分性能提供了潜力,但大规模收集数据不仅是昂贵的,而且在某些情况下也不切实际。此外,还不清楚其他数据必须提供的边际价值。本文中,我们通过人类质量保证(QA)提出了一种单通活动方法。我们建立在用于腹部多器官分割的预先训练的3D U-NET模型上,并使用离群数据(例如,基线算法失败)或INLIER(例如,基线算法工作的示例)增强了数据集(例如,基线算法失败)。使用具有5倍交叉验证的增强数据集(用于离群数据)并扣留了外部距离样本(对于iNlier数据),对新模型进行了培训。异常值的手动标记增加了0.130的骰子得分,而嵌入式器(p <0.001,两尾配对t检验)增加了0.067。通过在训练中增加5到37个嵌入式或离群值,我们发现添加异常值的边际价值高于添加嵌入式的边缘值。总而言之,获得单器官性能的改进,而没有减少多器官性能或显着增加训练时间。因此,对基线失败案例的识别和校正提出了选择训练数据以改善算法性能的有效方法。
Abdominal multi-organ segmentation of computed tomography (CT) images has been the subject of extensive research interest. It presents a substantial challenge in medical image processing, as the shape and distribution of abdominal organs can vary greatly among the population and within an individual over time. While continuous integration of novel datasets into the training set provides potential for better segmentation performance, collection of data at scale is not only costly, but also impractical in some contexts. Moreover, it remains unclear what marginal value additional data have to offer. Herein, we propose a single-pass active learning method through human quality assurance (QA). We built on a pre-trained 3D U-Net model for abdominal multi-organ segmentation and augmented the dataset either with outlier data (e.g., exemplars for which the baseline algorithm failed) or inliers (e.g., exemplars for which the baseline algorithm worked). The new models were trained using the augmented datasets with 5-fold cross-validation (for outlier data) and withheld outlier samples (for inlier data). Manual labeling of outliers increased Dice scores with outliers by 0.130, compared to an increase of 0.067 with inliers (p<0.001, two-tailed paired t-test). By adding 5 to 37 inliers or outliers to training, we find that the marginal value of adding outliers is higher than that of adding inliers. In summary, improvement on single-organ performance was obtained without diminishing multi-organ performance or significantly increasing training time. Hence, identification and correction of baseline failure cases present an effective and efficient method of selecting training data to improve algorithm performance.