论文标题
Emory乳房成像数据集(嵌入):一个种族多样的颗粒数据集,具有350万个筛查和诊断乳房X线照片
The EMory BrEast imaging Dataset (EMBED): A Racially Diverse, Granular Dataset of 3.5M Screening and Diagnostic Mammograms
论文作者
论文摘要
在医学成像中开发和验证人工智能模型需要大型,颗粒状和多样性的数据集。迄今为止,大多数公开可用的乳房成像数据集都缺乏这些领域的一个或多个。因此,对这些数据培训的模型可能不足以表现出在以前尚未遇到的患者人群或病理上。 Emory乳房成像数据集(嵌入)通过提供3650,000 2D和DBT筛查和诊断性乳房X线照片来解决这些差距,这使116,000名妇女在白人和非裔美国人患者之间平均分配。该数据集还包含与结构化成像描述符相关的40,000个带注释的病变,以及分为六个严重性类别的61个地面真实病理学结果。我们的目标是与研究合作伙伴共享此数据集,以帮助开发和验证乳腺AI模型,这些模型将为所有患者提供公平服务,并有助于减少医疗AI的偏见。
Developing and validating artificial intelligence models in medical imaging requires datasets that are large, granular, and diverse. To date, the majority of publicly available breast imaging datasets lack in one or more of these areas. Models trained on these data may therefore underperform on patient populations or pathologies that have not previously been encountered. The EMory BrEast imaging Dataset (EMBED) addresses these gaps by providing 3650,000 2D and DBT screening and diagnostic mammograms for 116,000 women divided equally between White and African American patients. The dataset also contains 40,000 annotated lesions linked to structured imaging descriptors and 61 ground truth pathologic outcomes grouped into six severity classes. Our goal is to share this dataset with research partners to aid in development and validation of breast AI models that will serve all patients fairly and help decrease bias in medical AI.