论文标题
大数据和基于模型的调查抽样
Big Data and model-based survey sampling
论文作者
论文摘要
大数据是大量的数字信息,这些信息会自动从多个来源获得或合并,并且很少是由适当计划的调查造成的。本文认为,一个大数据集是有关有限人群的信息集合。我们建议选择一个观测值以获取推论目标。我们假设超越人口模型已经生成了大数据集。有了这个假设,我们可以应用最佳设计理论来从包含有关未知参数的大多数信息的大数据集中绘制样本。
Big Data are huge amounts of digital information that are automatically accrued or merged from several sources and rarely result from properly planned surveys. A Big Dataset is herein conceived of as a collection of information concerning a finite population. We suggest selecting a sample of observations to get the inferential goal. We assume a super-population model has generated the Big Dataset. With this assumption, we can apply the theory of optimal design to draw a sample from the Big Dataset that contains the majority of the information about the unknown parameters.