论文标题
与gans合成观察健康数据:从缓慢的采用到医学研究的繁荣,最终是数字双胞胎?
Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?
论文作者
论文摘要
收集到患者护理后,观察健康数据(OHD)可以通过维持健康信息学和医学研究的发展进一步使患者福祉受益。由于与患者相关的数据和法规的严格私人性质,因此无法开发巨大的潜力。 生成对抗网络(GAN)最近成为一种开创性的学习生成模型的开创性方法,以产生逼真的合成数据。他们彻底改变了多个领域的实践,例如自动驾驶汽车,欺诈检测,工业部门的数字双胞胎模拟和医学成像。 数字双胞胎概念可以很容易地用于建模和量化疾病进展。此外,GAN具有与医疗保健中常见问题有关的许多功能:缺乏数据,阶级失衡,罕见疾病和保护隐私。开放获得保护隐私的OHD可能是科学研究的变革。在Covid-19中,医疗保健系统正面临着前所未有的挑战,其中许多是由于上述原因而言。 考虑到这些事实,有关GAN应用于OHD的出版物似乎严重缺乏。为了揭示这种缓慢采用的原因,我们广泛回顾了有关该主题的文献。我们的发现表明,OHD的性质最初对现有的GAN算法具有挑战性(与医学成像不同,最先进的模型可以直接转移),并且评估合成数据缺乏明确的指标。 我们发现有关该主题的出版物比预期的要多,从2017年开始缓慢,从那时起就以越来越多的速度开始。 OHD的困难仍然存在,我们讨论了与评估,一致性,基准测试,数据建模和可重复性有关的问题。
After being collected for patient care, Observational Health Data (OHD) can further benefit patient well-being by sustaining the development of health informatics and medical research. Vast potential is unexploited because of the fiercely private nature of patient-related data and regulations to protect it. Generative Adversarial Networks (GANs) have recently emerged as a groundbreaking way to learn generative models that produce realistic synthetic data. They have revolutionized practices in multiple domains such as self-driving cars, fraud detection, digital twin simulations in industrial sectors, and medical imaging. The digital twin concept could readily apply to modelling and quantifying disease progression. In addition, GANs posses many capabilities relevant to common problems in healthcare: lack of data, class imbalance, rare diseases, and preserving privacy. Unlocking open access to privacy-preserving OHD could be transformative for scientific research. In the midst of COVID-19, the healthcare system is facing unprecedented challenges, many of which of are data related for the reasons stated above. Considering these facts, publications concerning GAN applied to OHD seemed to be severely lacking. To uncover the reasons for this slow adoption, we broadly reviewed the published literature on the subject. Our findings show that the properties of OHD were initially challenging for the existing GAN algorithms (unlike medical imaging, for which state-of-the-art model were directly transferable) and the evaluation synthetic data lacked clear metrics. We find more publications on the subject than expected, starting slowly in 2017, and since then at an increasing rate. The difficulties of OHD remain, and we discuss issues relating to evaluation, consistency, benchmarking, data modelling, and reproducibility.