致力于医疗领域联邦学习的实用性

论文标题

致力于医疗领域联邦学习的实用性

Towards the Practical Utility of Federated Learning in the Medical Domain

论文作者

Yang, Seongjun, Hwang, Hyeonji, Kim, Daeyoung, Dua, Radhika, Kim, Jong-Yeup, Yang, Eunho, Choi, Edward

论文摘要

联邦学习（FL）是一个积极的研究领域。采用FL的最合适区域之一是医疗领域，必须尊重患者隐私。但是，先前的研究并未提供在医疗领域应用FL的实用指南。我们提出了三个具有不同方式的代表性医疗数据集的经验基准和实验设置：纵向电子健康记录，皮肤癌图像和心电图信号。诸如医疗机构和IT公司之类的FL的可能用户可以将这些基准作为采用FL的指南，并最大程度地减少其反复试验。对于每个数据集，每个客户端数据都来自不同的来源，以保留现实世界的异质性。我们评估了六种旨在解决客户之间数据异质性的FL算法，以及结合了两种代表性FL算法的优势的混合算法。根据三种方式的实验结果，我们发现简单的FL算法倾向于优于更复杂的算法，而混合算法始终显示出良好的表现，即使不是最佳性能。我们还发现，在固定的培训迭代预算下，频繁的全球模型更新会导致更好的性能。随着参与客户数量的增加，由于IT管理员和GPU的增加而产生了更高的成本，但是绩效始终增加。我们希望未来的用户将参考这些经验基准，以设计其临床任务，并以较低的成本获得更强的性能，以设计医疗领域的FL实验。

Federated learning (FL) is an active area of research. One of the most suitable areas for adopting FL is the medical domain, where patient privacy must be respected. Previous research, however, does not provide a practical guide to applying FL in the medical domain. We propose empirical benchmarks and experimental settings for three representative medical datasets with different modalities: longitudinal electronic health records, skin cancer images, and electrocardiogram signals. The likely users of FL such as medical institutions and IT companies can take these benchmarks as guides for adopting FL and minimize their trial and error. For each dataset, each client data is from a different source to preserve real-world heterogeneity. We evaluate six FL algorithms designed for addressing data heterogeneity among clients, and a hybrid algorithm combining the strengths of two representative FL algorithms. Based on experiment results from three modalities, we discover that simple FL algorithms tend to outperform more sophisticated ones, while the hybrid algorithm consistently shows good, if not the best performance. We also find that a frequent global model update leads to better performance under a fixed training iteration budget. As the number of participating clients increases, higher cost is incurred due to increased IT administrators and GPUs, but the performance consistently increases. We expect future users will refer to these empirical benchmarks to design the FL experiments in the medical domain considering their clinical tasks and obtain stronger performance with lower costs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题