论文标题
医疗无法分配检测的基准
A Benchmark of Medical Out of Distribution Detection
论文作者
论文摘要
动机:用于医疗任务的深度学习模型可以配备分布式检测(OODD)方法,以避免错误的预测。但是,尚不清楚应在实践中使用哪种OODD方法。特定问题:对一个特定图像域进行培训的系统不能预期在不同域的图像上准确执行。这些图像应在诊断之前通过OODD方法标记。我们的方法:本文定义了三种OOD示例和基准在三种医学成像领域中流行的OODD方法:胸部X射线,眼底成像和组织学幻灯片。结果:我们的实验表明,尽管方法在某些类别的分发样本上取得了良好的结果,但他们无法识别接近训练分布的图像。结论:我们发现功能表示形式上的简单二进制分类器具有最佳的准确性和AUPRC。采用这些OODD方法的诊断工具的用户仍然应该保持警惕,即图像非常接近训练分布,但没有产生意外的结果。
Motivation: Deep learning models deployed for use on medical tasks can be equipped with Out-of-Distribution Detection (OoDD) methods in order to avoid erroneous predictions. However it is unclear which OoDD method should be used in practice. Specific Problem: Systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images should be flagged by an OoDD method prior to diagnosis. Our approach: This paper defines 3 categories of OoD examples and benchmarks popular OoDD methods in three domains of medical imaging: chest X-ray, fundus imaging, and histology slides. Results: Our experiments show that despite methods yielding good results on some categories of out-of-distribution samples, they fail to recognize images close to the training distribution. Conclusion: We find a simple binary classifier on the feature representation has the best accuracy and AUPRC on average. Users of diagnostic tools which employ these OoDD methods should still remain vigilant that images very close to the training distribution yet not in it could yield unexpected results.