论文标题

最大采样有条件的可能性子采样的可能性

Maximum sampled conditional likelihood for informative subsampling

论文作者

Wang, HaiYing, Kim, Jae Kwang

论文摘要

亚采样是一种计算有效的方法,可以在计算资源受到限制时从大量数据集中提取信息。从完整数据中获取子样本后,大多数可用方法使用反相反的加权(IPW)目标函数来估计模型参数。 IPW估计器并未完全利用所选子样本中的信息。在本文中,我们建议根据采样数据使用最大采样的条件可能性估计量(MSCLE)。我们确定了MSCLE的渐近正态性,并证明其渐近方差协方差矩阵是包括IPW估计量在内的一类渐近无偏估计器中最小的。我们进一步讨论了L-最佳亚采样概率的渐近结果,并用广义线性模型说明了估计程序。提供了数值实验来评估所提出方法的实际性能。

Subsampling is a computationally effective approach to extract information from massive data sets when computing resources are limited. After a subsample is taken from the full data, most available methods use an inverse probability weighted (IPW) objective function to estimate the model parameters. The IPW estimator does not fully utilize the information in the selected subsample. In this paper, we propose to use the maximum sampled conditional likelihood estimator (MSCLE) based on the sampled data. We established the asymptotic normality of the MSCLE and prove that its asymptotic variance covariance matrix is the smallest among a class of asymptotically unbiased estimators, including the IPW estimator. We further discuss the asymptotic results with the L-optimal subsampling probabilities and illustrate the estimation procedure with generalized linear models. Numerical experiments are provided to evaluate the practical performance of the proposed method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源