多源生存领域适应

论文标题

多源生存领域适应

Multi-Source Survival Domain Adaptation

论文作者

Shaker, Ammar, Lawrence, Carolin

论文摘要

生存分析是统计的分支，该统计数据研究了审查案件持有的部分信息，研究生物实体的特征与其各自的生存时间之间的关系。例如，良好的分析可以确定一组患者的一种药物治疗是否比另一种药物更好。随着机器学习的兴起，可以将生存分析建模为学习将患者研究到其生存时间的功能。为了成功，要解决三个关键问题。首先，对某些患者数据进行了审查：我们不知道所有患者的真正生存时间。其次，数据稀缺，这导致过去的研究将不同的疾病类型视为多任务设置中的领域。第三，需要适应新的或极少数疾病类型，那里很少或没有标签。与以前的多任务设置相反，我们想研究如何有效地适应来自多个生存源域的新生存目标域。为此，我们介绍了一个新的生存度量和生存分布之间的相应差异度量。这些使我们能够在合并审查数据的同时定义生存分析的域适应性，否则必须将其删除。我们对两个癌症数据集的实验揭示了目标结构域的出色性能，更好的治疗建议以及具有合理解释的重量矩阵。

Survival analysis is the branch of statistics that studies the relation between the characteristics of living entities and their respective survival times, taking into account the partial information held by censored cases. A good analysis can, for example, determine whether one medical treatment for a group of patients is better than another. With the rise of machine learning, survival analysis can be modeled as learning a function that maps studied patients to their survival times. To succeed with that, there are three crucial issues to be tackled. First, some patient data is censored: we do not know the true survival times for all patients. Second, data is scarce, which led past research to treat different illness types as domains in a multi-task setup. Third, there is the need for adaptation to new or extremely rare illness types, where little or no labels are available. In contrast to previous multi-task setups, we want to investigate how to efficiently adapt to a new survival target domain from multiple survival source domains. For this, we introduce a new survival metric and the corresponding discrepancy measure between survival distributions. These allow us to define domain adaptation for survival analysis while incorporating censored data, which would otherwise have to be dropped. Our experiments on two cancer data sets reveal a superb performance on target domains, a better treatment recommendation, and a weight matrix with a plausible explanation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题