论文标题
通过已知协变量偏移的预测误差的估计
Estimation of prediction error with known covariate shift
论文作者
论文摘要
在监督学习中,未标记测试数据上预测错误的估计是一项重要任务。现有方法通常是基于以下假设:训练和测试数据是从相同分布中取样的,这在实践中通常会违反。结果,诸如交叉验证(CV)之类的传统估计器将偏向偏差,这可能导致模型选择差。在本文中,我们假设我们有一个测试数据集,其中特征值可用,而不是结果标签,并专注于一种称为“协方差偏移”的特定形式的分布偏移形式。我们提出了一种基于条件误差目标的参数引导程序。从经验上讲,我们的方法在不同建模任务上均优于模拟和真实数据示例的简历。
In supervised learning, the estimation of prediction error on unlabeled test data is an important task. Existing methods are usually built on the assumption that the training and test data are sampled from the same distribution, which is often violated in practice. As a result, traditional estimators like cross-validation (CV) will be biased and this may result in poor model selection. In this paper, we assume that we have a test dataset in which the feature values are available but not the outcome labels, and focus on a particular form of distributional shift called "covariate shift". We propose an alternative method based on parametric bootstrap of the target of conditional error. Empirically, our method outperforms CV for both simulation and real data example across different modeling tasks.