论文标题

深度学习的广义线性模型,缺少数据

Deeply-Learned Generalized Linear Models with Missing Data

论文作者

Lim, David K, Rashid, Naim U, Oliva, Junier B, Ibrahim, Joseph G

论文摘要

近年来,深度学习(DL)方法的流行程度急剧增加,其应用在生物医学科学中的监督学习问题中的应用显着增长。但是,现代生物医学数据集中缺失数据的更大流行和复杂性对DL方法提出了重大挑战。在这里,我们在深入学习的广义线性模型的背景下提供了对丢失数据的正式处理,这是一种监督的DL架构,用于回归和分类问题。我们提出了一种新的体系结构,即\ textit {dlglm},这是第一个能够灵活地说明输入功能和训练时响应中忽略和不可忽视的缺失模式之一。我们通过统计模拟证明,我们的方法在没有随机(mnar)缺失的情况下胜过监督学习任务的现有方法。我们从UCI机器学习存储库中对银行营销数据集进行了案例研究,在该数据集中我们预测客户是否基于电话调查数据订阅了产品。本文的补充材料可在线获得。

Deep Learning (DL) methods have dramatically increased in popularity in recent years, with significant growth in their application to supervised learning problems in the biomedical sciences. However, the greater prevalence and complexity of missing data in modern biomedical datasets present significant challenges for DL methods. Here, we provide a formal treatment of missing data in the context of deeply learned generalized linear models, a supervised DL architecture for regression and classification problems. We propose a new architecture, \textit{dlglm}, that is one of the first to be able to flexibly account for both ignorable and non-ignorable patterns of missingness in input features and response at training time. We demonstrate through statistical simulation that our method outperforms existing approaches for supervised learning tasks in the presence of missing not at random (MNAR) missingness. We conclude with a case study of a Bank Marketing dataset from the UCI Machine Learning Repository, in which we predict whether clients subscribed to a product based on phone survey data. Supplementary materials for this article are available online.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源