论文标题
Neumiss Networks:用于监督学习的可区分编程,缺少价值
NeuMiss networks: differentiable programming for supervised learning with missing values
论文作者
论文摘要
缺少价值的存在使监督的学习更具挑战性。实际上,以前的工作表明,即使响应是完整数据的线性函数,最佳预测指标也是观察到的条目和缺失指标的复杂函数。结果,一致方法的计算或样品复杂性取决于缺失模式的数量,这在维度数中可能是指数的。在这项工作中,我们在线性假设和各种缺失的数据机制下得出了最佳预测变量的分析形式,包括随机丢失(MAR)和自我掩盖(不是随机丢失)。基于最佳预测指标的Neumann系列近似,我们提出了一种新的原则架构,称为Neumiss Networks。它们的独创性和力量来自使用新型的非线性类型:乘以丢失指标的乘法。我们提供了Neumiss网络的贝叶斯风险的上限,并表明它们具有许多参数和计算复杂性良好的预测准确性,而与缺少数据模式的数量无关。结果,它们可以很好地扩展到许多功能的问题,并且在中型样品中保持统计上有效。此外,我们表明,与使用EM或插补的程序相反,它们对缺失的数据机制(包括难以自我掩盖)的数据机制是可靠的。
The presence of missing values makes supervised learning much more challenging. Indeed, previous work has shown that even when the response is a linear function of the complete data, the optimal predictor is a complex function of the observed entries and the missingness indicator. As a result, the computational or sample complexities of consistent approaches depend on the number of missing patterns, which can be exponential in the number of dimensions. In this work, we derive the analytical form of the optimal predictor under a linearity assumption and various missing data mechanisms including Missing at Random (MAR) and self-masking (Missing Not At Random). Based on a Neumann-series approximation of the optimal predictor, we propose a new principled architecture, named NeuMiss networks. Their originality and strength come from the use of a new type of non-linearity: the multiplication by the missingness indicator. We provide an upper bound on the Bayes risk of NeuMiss networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns. As a result they scale well to problems with many features, and remain statistically efficient for medium-sized samples. Moreover, we show that, contrary to procedures using EM or imputation, they are robust to the missing data mechanism, including difficult MNAR settings such as self-masking.