基准基准深度神经网络对数字病理中常见腐败的鲁棒性

论文标题

基准基准深度神经网络对数字病理中常见腐败的鲁棒性

Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital Pathology

论文作者

Zhang, Yunlong, Sun, Yuxuan, Li, Honglin, Zheng, Sunyi, Zhu, Chenglu, Yang, Lin

论文摘要

在为临床应用设计诊断模型时，至关重要的是要确保模型在各种图像损坏方面的鲁棒性。在此，建立了易于使用的基准，以评估神经网络在损坏的病理图像上的性能。具体而言，损坏的图像是通过将九种类型的常见损坏注入验证图像来生成的。此外，两个分类和一个排名指标旨在评估腐败下的预测和信心表现。在两个由此产生的基准数据集进行了评估，我们发现（1）各种深神经网络模型的准确性降低了显着降低（在干净图像上的误差是两倍）以及对损坏图像的不可靠置信度估计；（2）验证和测试错误之间的相关性较低，同时用我们的基准替换验证集可以增加相关性。我们的代码可在https://github.com/superjamessyx/robustness_benchmark上找到。

When designing a diagnostic model for a clinical application, it is crucial to guarantee the robustness of the model with respect to a wide range of image corruptions. Herein, an easy-to-use benchmark is established to evaluate how deep neural networks perform on corrupted pathology images. Specifically, corrupted images are generated by injecting nine types of common corruptions into validation images. Besides, two classification and one ranking metrics are designed to evaluate the prediction and confidence performance under corruption. Evaluated on two resulting benchmark datasets, we find that (1) a variety of deep neural network models suffer from a significant accuracy decrease (double the error on clean images) and the unreliable confidence estimation on corrupted images; (2) A low correlation between the validation and test errors while replacing the validation set with our benchmark can increase the correlation. Our codes are available on https://github.com/superjamessyx/robustness_benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题