DERMX：可解释自动皮肤病学诊断的端到端框架

论文标题

DERMX：可解释自动皮肤病学诊断的端到端框架

DermX: an end-to-end framework for explainable automated dermatological diagnosis

论文作者

Jalaboi, Raluca, Faye, Frederik, Orbes-Arteaga, Mauricio, Jørgensen, Dan, Winther, Ole, Galimzianova, Alfiia

论文摘要

皮肤病学诊断自动化对于解决皮肤疾病的高流行和皮肤病学家的严重短缺至关重要。尽管接近专家级的诊断性能，但在临床实践中采用卷积神经网络（Convnet）受到其有限的解释性和主观昂贵的解释性验证的影响。我们介绍了DERMX和DERMX+，这是一种可解释的自动皮肤病学诊断的端到端框架。 DERMX是一种临床风格的可解释的皮肤病学诊断Convnet，使用DERMXDB训练，这是由八位具有诊断性的皮肤科医生注释的554图像数据集，支持解释和解释注意力图。 DERMX+通过引导注意训练的引起注意力图扩展了DERMX。两种方法都达到了接近专家的诊断性能，ERX，DERMX+和皮肤科医生F1得分分别为0.79、0.79和0.87。我们分别与皮肤科医生选择的解释和分别与皮肤科医生解释图的模型进行比较，评估了识别和本地化的解释性能。 DERMX获得了0.77的识别F1得分，而DERMX+获得了0.79。 DERX的本地化F1分数为0.39，DERMX+0.35。这些结果表明，解释性不一定是以预测能力为代价的，因为我们的高性能模型为其诊断提供了专家启发的解释，而无需降低诊断性能。

Dermatological diagnosis automation is essential in addressing the high prevalence of skin diseases and critical shortage of dermatologists. Despite approaching expert-level diagnosis performance, convolutional neural network (ConvNet) adoption in clinical practice is impeded by their limited explainability, and by subjective, expensive explainability validations. We introduce DermX and DermX+, an end-to-end framework for explainable automated dermatological diagnosis. DermX is a clinically-inspired explainable dermatological diagnosis ConvNet, trained using DermXDB, a 554 image dataset annotated by eight dermatologists with diagnoses, supporting explanations, and explanation attention maps. DermX+ extends DermX with guided attention training for explanation attention maps. Both methods achieve near-expert diagnosis performance, with DermX, DermX+, and dermatologist F1 scores of 0.79, 0.79, and 0.87, respectively. We assess the explanation performance in terms of identification and localization by comparing model-selected with dermatologist-selected explanations, and gradient-weighted class-activation maps with dermatologist explanation maps, respectively. DermX obtained an identification F1 score of 0.77, while DermX+ obtained 0.79. The localization F1 score is 0.39 for DermX and 0.35 for DermX+. These results show that explainability does not necessarily come at the expense of predictive power, as our high-performance models provide expert-inspired explanations for their diagnoses without lowering their diagnosis performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题