论文标题

通过离散时间COX模型联合生存分析

Federated Survival Analysis with Discrete-Time Cox Models

论文作者

Andreux, Mathieu, Manoel, Andre, Menuet, Romuald, Saillard, Charlie, Simpson, Chloé

论文摘要

从位于联邦学习(FL)的不同中心的分散数据集中建造机器学习模型是一种有前途的方法,可以在保留隐私权的同时避免本地数据稀缺。但是,用于生存分析的突出的COX比例危害(PH)模型不符合FL框架,因为相对于样品,其损耗函数是不可分割的。绕过这种不可分割性的天真方法在于计算每个中心的损失,并将其总和最小化为真实损失的近似值。我们表明,在某些不良设置中,由此产生的模型可能会遭受重要的绩效损失。取而代之的是,我们利用COX pH模型的离散时间扩展为具有可分离损耗函数的分类问题来制定生存分析。使用这种方法,我们使用标准FL技术在综合数据上以及来自癌症基因组图集(TCGA)的现实数据集训练生存模型,显示出与在聚合数据上训练的COX pH模型相似的性能。与以前的工作相比,所提出的方法具有更高的沟通效率,更通用,并且更适合使用隐私保护技术。

Building machine learning models from decentralized datasets located in different centers with federated learning (FL) is a promising approach to circumvent local data scarcity while preserving privacy. However, the prominent Cox proportional hazards (PH) model, used for survival analysis, does not fit the FL framework, as its loss function is non-separable with respect to the samples. The naïve method to bypass this non-separability consists in calculating the losses per center, and minimizing their sum as an approximation of the true loss. We show that the resulting model may suffer from important performance loss in some adverse settings. Instead, we leverage the discrete-time extension of the Cox PH model to formulate survival analysis as a classification problem with a separable loss function. Using this approach, we train survival models using standard FL techniques on synthetic data, as well as real-world datasets from The Cancer Genome Atlas (TCGA), showing similar performance to a Cox PH model trained on aggregated data. Compared to previous works, the proposed method is more communication-efficient, more generic, and more amenable to using privacy-preserving techniques.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源