论文标题
保留隐私的技术来帮助数百万人:预防中风的联合预测模型
Privacy-Preserving Technology to Help Millions of People: Federated Prediction Model for Stroke Prevention
论文作者
论文摘要
与其相关危险因素预防中风一直是全球公共卫生优先事项之一。新兴的人工智能技术越来越多地采用来预测中风。由于存在隐私问题,患者数据被存储在分布式电子健康记录(EHR)数据库中,即大量的临床数据集,这些数据集可以防止患者数据被汇总并限制AI技术,从而通过集中培训数据来提高中风预测的准确性。在这项工作中,我们的科学家和工程师提出了一个隐私计划,以预测中风的风险并在云服务器上部署我们的联合预测模型。我们的联合预测模型系统异步支持每个通信中的任何数量的客户端连接和任意的本地梯度迭代。它在模型培训过程中采用联邦平均,在整个模型培训和预测过程中,没有将患者数据从医院中删除。有了隐私权机制,我们联合的预测模型训练来自某个城市中医院的所有医疗数据,而没有实际数据共享。因此,它不仅安全,而且比仅在一家医院训练数据的任何单个预测模型都更准确。特别是对于几乎没有确认中风病例的小型医院,我们的联合模型在几个机器学习指标中将模型性能提高了10%〜20%。为了帮助中风专家更直观地理解我们的预测系统的优势,我们开发了一个移动应用程序,该应用程序收集了患者统计数据的关键信息,并在联合培训过程中展示了联合预测模型与单个预测模型之间的性能比较。
Prevention of stroke with its associated risk factors has been one of the public health priorities worldwide. Emerging artificial intelligence technology is being increasingly adopted to predict stroke. Because of privacy concerns, patient data are stored in distributed electronic health record (EHR) databases, voluminous clinical datasets, which prevent patient data from being aggregated and restrains AI technology to boost the accuracy of stroke prediction with centralized training data. In this work, our scientists and engineers propose a privacy-preserving scheme to predict the risk of stroke and deploy our federated prediction model on cloud servers. Our system of federated prediction model asynchronously supports any number of client connections and arbitrary local gradient iterations in each communication round. It adopts federated averaging during the model training process, without patient data being taken out of the hospitals during the whole process of model training and forecasting. With the privacy-preserving mechanism, our federated prediction model trains over all the healthcare data from hospitals in a certain city without actual data sharing among them. Therefore, it is not only secure but also more accurate than any single prediction model that trains over the data only from one single hospital. Especially for small hospitals with few confirmed stroke cases, our federated model boosts model performance by 10%~20% in several machine learning metrics. To help stroke experts comprehend the advantage of our prediction system more intuitively, we developed a mobile app that collects the key information of patients' statistics and demonstrates performance comparisons between the federated prediction model and the single prediction model during the federated training process.