论文标题
使用上下文gans合成病变可改善乳腺癌的分类
Synthesizing lesions using contextual GANs improves breast cancer classification on mammograms
论文作者
论文摘要
数据稀缺和阶级失衡是许多机器学习应用程序对医疗保健的两个基本挑战。乳腺X线摄影中的乳腺癌分类体现了这些挑战,在筛查人群中,恶性率约为0.5%,在恶性病例中,病变大小(〜1%)的尺寸相对较小。同时,筛查乳房X线摄影的患病率可能会导致许多非癌症检查的潜在培训。总的来说,这些特征导致对癌症病例的过度适应,同时未充分利用非癌症数据。在这里,我们提出了一个新颖的生成对抗网络(GAN)模型,以实现现实合成乳房X线照片的病变并消除病变。借助自我注意事项和半监督学习组件,基于U-NET的体系结构可以根据乳房X线摄影的必要性产生高分辨率(256x256px)输出。在使用GAN生成的样品增强原始训练集时,我们发现在一组实际乳房X线照片贴片的测试集中,恶性分类性能有了显着改善。总体而言,我们的算法的经验结果以及与其他医学成像范式的相关性指出了潜在的富有成果的进一步应用。
Data scarcity and class imbalance are two fundamental challenges in many machine learning applications to healthcare. Breast cancer classification in mammography exemplifies these challenges, with a malignancy rate of around 0.5% in a screening population, which is compounded by the relatively small size of lesions (~1% of the image) in malignant cases. Simultaneously, the prevalence of screening mammography creates a potential abundance of non-cancer exams to use for training. Altogether, these characteristics lead to overfitting on cancer cases, while under-utilizing non-cancer data. Here, we present a novel generative adversarial network (GAN) model for data augmentation that can realistically synthesize and remove lesions on mammograms. With self-attention and semi-supervised learning components, the U-net-based architecture can generate high resolution (256x256px) outputs, as necessary for mammography. When augmenting the original training set with the GAN-generated samples, we find a significant improvement in malignancy classification performance on a test set of real mammogram patches. Overall, the empirical results of our algorithm and the relevance to other medical imaging paradigms point to potentially fruitful further applications.