Cardea：电子健康记录的开放自动化机器学习框架

论文标题

Cardea：电子健康记录的开放自动化机器学习框架

Cardea: An Open Automated Machine Learning Framework for Electronic Health Records

论文作者

Alnegheimish, Sarah, Alrashed, Najat, Aleissa, Faisal, Althobaiti, Shahad, Liu, Dongyu, Alsaleh, Mansour, Veeramachaneni, Kalyan

论文摘要

估计有180篇针对深度学习和EHR的论文在2010年至2018年之间发表。尽管这些出版物中出现了共同的工作流结构，但尚无可信赖和经过验证的软件框架，迫使研究人员严厉地重复以前的工作。在本文中，我们提出了CADEA，这是一个可扩展的开源自动化机器学习框架，该框架封装了健康领域中的常见预测问题，并允许用户使用自己的数据构建预测模型。该系统依赖于两个组件：快速医疗保健互操作性资源（FHIR） - 电子卫生系统的标准数据结构 - 以及几个用于自动化功能工程，模型选择和调整的自动化框架。我们使用自适应数据组装程序以及全面的数据和模型审核功能来增强这些组件。我们通过对模拟III和Kaggle数据集的5个预测任务展示了我们的框架，这些任务突出了Cardea的人类竞争力，问题定义的灵活性，广泛的功能生成能力，适应性的自动数据组装程序及其可用性。

An estimated 180 papers focusing on deep learning and EHR were published between 2010 and 2018. Despite the common workflow structure appearing in these publications, no trusted and verified software framework exists, forcing researchers to arduously repeat previous work. In this paper, we propose Cardea, an extensible open-source automated machine learning framework encapsulating common prediction problems in the health domain and allows users to build predictive models with their own data. This system relies on two components: Fast Healthcare Interoperability Resources (FHIR) -- a standardized data structure for electronic health systems -- and several AUTOML frameworks for automated feature engineering, model selection, and tuning. We augment these components with an adaptive data assembler and comprehensive data- and model- auditing capabilities. We demonstrate our framework via 5 prediction tasks on MIMIC-III and Kaggle datasets, which highlight Cardea's human competitiveness, flexibility in problem definition, extensive feature generation capability, adaptable automatic data assembler, and its usability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题