论文标题

解决视网膜疾病诊断中的人工智能偏见

Addressing Artificial Intelligence Bias in Retinal Disease Diagnostics

论文作者

Burlina, Philippe, Joshi, Neil, Paul, William, Pacheco, Katia D., Bressler, Neil M.

论文摘要

这项研究评估了在诊断培训数据不平衡或域概括引起的糖尿病性视网膜病(DR)时,可以评估生成方法,以减轻AI偏差,或者在深度学习系统(DLS)面对测试/推理时间面对概念时发生的域概括时,它们最初没有受到培训。 The public domain Kaggle-EyePACS dataset (88,692 fundi and 44,346 individuals, originally diverse for ethnicity) was modified by adding clinician-annotated labels and constructing an artificial scenario of data imbalance and domain generalization by disallowing training (but not testing) exemplars for images of retinas with DR warranting referral (DR-referable) and from darker-skin平均而言,大概在卵巢黑色素细胞中具有较高浓度的黑色素的个体会导致视网膜图像色素沉着。将传统/基线诊断DLS与新的DLSS进行了比较,这些DLS将使用通过生成模型增强培训数据进行依据。皮肤较轻个体的眼底图像的基线诊断DLS的准确性(95%置信区间[CI])为73.0%(66.9%,79.2%,79.2%)与60.5%(53.5%,67.3%)的深色皮肤,表明偏见(delta = 12.5%)(welta = 12.5%)受保护的亚群。 Using novel generative methods for addressing missing subpopulation training data (DR-referable darker-skin) achieved instead accuracy, for lighter-skin, of 72.0% (65.8%, 78.2%), and for darker-skin, of 71.5% (65.2%,77.8%), demonstrating closer parity (delta=0.5%) in accuracy across subpopulations (Welch t-test t=0.111, p = .912)。发现说明了数据不平衡和域的概括如何导致跨亚群的准确性差异,并表明合成底面图像的新型生成方法可能对DebiAs AI起作用。

This study evaluated generative methods to potentially mitigate AI bias when diagnosing diabetic retinopathy (DR) resulting from training data imbalance, or domain generalization which occurs when deep learning systems (DLS) face concepts at test/inference time they were not initially trained on. The public domain Kaggle-EyePACS dataset (88,692 fundi and 44,346 individuals, originally diverse for ethnicity) was modified by adding clinician-annotated labels and constructing an artificial scenario of data imbalance and domain generalization by disallowing training (but not testing) exemplars for images of retinas with DR warranting referral (DR-referable) and from darker-skin individuals, who presumably have greater concentration of melanin within uveal melanocytes, on average, contributing to retinal image pigmentation. A traditional/baseline diagnostic DLS was compared against new DLSs that would use training data augmented via generative models for debiasing. Accuracy (95% confidence intervals [CI]) of the baseline diagnostics DLS for fundus images of lighter-skin individuals was 73.0% (66.9%, 79.2%) vs. darker-skin of 60.5% (53.5%, 67.3%), demonstrating bias/disparity (delta=12.5%) (Welch t-test t=2.670, P=.008) in AI performance across protected subpopulations. Using novel generative methods for addressing missing subpopulation training data (DR-referable darker-skin) achieved instead accuracy, for lighter-skin, of 72.0% (65.8%, 78.2%), and for darker-skin, of 71.5% (65.2%,77.8%), demonstrating closer parity (delta=0.5%) in accuracy across subpopulations (Welch t-test t=0.111, P=.912). Findings illustrate how data imbalance and domain generalization can lead to disparity of accuracy across subpopulations, and show that novel generative methods of synthetic fundus images may play a role for debiasing AI.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源