论文标题
量化和利用医学图像评估的预测不确定性
Quantifying and Leveraging Predictive Uncertainty for Medical Image Assessment
论文作者
论文摘要
医学图像的解释是一项具有挑战性的任务,通常由于人工制品,遮挡,有限的对比度等而变得复杂。最值得注意的是胸部射线照相术,在异常的检测和分类中,评分者间的变化很高。这在很大程度上是由于数据或疾病外观的主观定义中的不确定的证据所致。另一个示例是基于2D超声图像的解剖学视图的分类。通常,在框架中捕获的解剖环境不足以识别潜在的解剖结构。当前针对这些问题的机器学习解决方案通常仅限于提供概率预测,这取决于基本模型适应有限信息和高标签噪声的能力。但是,实际上,这会导致对看不见数据的概括较差的过度自信系统。为此,我们提出了一个系统,该系统不仅可以学习分类的概率估计,而且还了解一个明确的不确定性度量,该测度捕获了系统对预测输出的信心。我们认为,这种方法对于从不同放射学检查(包括计算的放射线摄影,超声检查和磁共振成像)的医学图像的固有歧义特征至关重要。在我们的实验中,我们证明了基于预测的不确定性的样本排斥可以显着改善各种任务的ROC-AUC,例如8%至0.91,预期排斥率的预期排斥率为25%,以低于25%的胸部X线射线照相。此外,我们表明,使用不确定性驱动的自举来过滤训练数据,可以显着提高鲁棒性和准确性。
The interpretation of medical images is a challenging task, often complicated by the presence of artifacts, occlusions, limited contrast and more. Most notable is the case of chest radiography, where there is a high inter-rater variability in the detection and classification of abnormalities. This is largely due to inconclusive evidence in the data or subjective definitions of disease appearance. An additional example is the classification of anatomical views based on 2D Ultrasound images. Often, the anatomical context captured in a frame is not sufficient to recognize the underlying anatomy. Current machine learning solutions for these problems are typically limited to providing probabilistic predictions, relying on the capacity of underlying models to adapt to limited information and the high degree of label noise. In practice, however, this leads to overconfident systems with poor generalization on unseen data. To account for this, we propose a system that learns not only the probabilistic estimate for classification, but also an explicit uncertainty measure which captures the confidence of the system in the predicted output. We argue that this approach is essential to account for the inherent ambiguity characteristic of medical images from different radiologic exams including computed radiography, ultrasonography and magnetic resonance imaging. In our experiments we demonstrate that sample rejection based on the predicted uncertainty can significantly improve the ROC-AUC for various tasks, e.g., by 8% to 0.91 with an expected rejection rate of under 25% for the classification of different abnormalities in chest radiographs. In addition, we show that using uncertainty-driven bootstrapping to filter the training data, one can achieve a significant increase in robustness and accuracy.