论文标题

处理癌症的异质性损失和处理癌症的主体间变异性

Heterogeneity Loss to Handle Intersubject and Intrasubject Variability in Cancer

论文作者

Goswami, Shubham, Mehta, Suril, Sahrawat, Dhruva, Gupta, Anubha, Gupta, Ritu

论文摘要

发展中国家缺乏足够数量的拥有现代设备和熟练医生的医院。因此,这些国家的人口中很大一部分,尤其是在农村地区,无法使用专业和及时的医疗机构。近年来,一类人工智能(AI)方法的深度学习模型(DL)模型在医疗领域显示出了令人印象深刻的结果。这些AI方法可以为发展中国家作为负担得起的医疗保健解决方案提供巨大的支持。这项工作的重点是这样的血液癌诊断。但是,癌症研究中DL模型面临一些挑战,因为大量数据无法获得足够的培训以及在不同级别的数据中捕获异质性的难度从获取特征,会话,会话到受试者级别(在受试者和跨科目中)。这些挑战使DL模型容易拟合,因此模型缺乏对潜在受试者数据的概括。在这项工作中,我们在使用深度学习的B细胞急性淋巴细胞白血病(B-ALL)诊断中解决了这些问题。我们提出异质性损失,以捕获主题级异质性,从而迫使神经网络学习与主题无关的特征。我们还提出了一种非正统的合奏策略,该策略可帮助我们提供改进的分类,比在7倍上训练的型号,加权为$ f_1 $得分为95.26%,以看不见的(测试)受试者的数据,这是迄今为止C-NMC 2019数据集的最佳结果,用于B-all Call-all Callastification。

Developing nations lack adequate number of hospitals with modern equipment and skilled doctors. Hence, a significant proportion of these nations' population, particularly in rural areas, is not able to avail specialized and timely healthcare facilities. In recent years, deep learning (DL) models, a class of artificial intelligence (AI) methods, have shown impressive results in medical domain. These AI methods can provide immense support to developing nations as affordable healthcare solutions. This work is focused on one such application of blood cancer diagnosis. However, there are some challenges to DL models in cancer research because of the unavailability of a large data for adequate training and the difficulty of capturing heterogeneity in data at different levels ranging from acquisition characteristics, session, to subject-level (within subjects and across subjects). These challenges render DL models prone to overfitting and hence, models lack generalization on prospective subjects' data. In this work, we address these problems in the application of B-cell Acute Lymphoblastic Leukemia (B-ALL) diagnosis using deep learning. We propose heterogeneity loss that captures subject-level heterogeneity, thereby, forcing the neural network to learn subject-independent features. We also propose an unorthodox ensemble strategy that helps us in providing improved classification over models trained on 7-folds giving a weighted-$F_1$ score of 95.26% on unseen (test) subjects' data that are, so far, the best results on the C-NMC 2019 dataset for B-ALL classification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源