论文标题

收入的两阶段数据综合:NHIS的应用

Two-Phase Data Synthesis for Income: An Application to the NHIS

论文作者

Ros, Kevin, Olsson, Henrik, Hu, Jingchen

论文摘要

我们提出了一个用于合成收入的两相合成过程,这是一种敏感变量,通常是高度扭曲的,并具有许多报告的零。我们考虑了连续收入变量的两种形式:二进制形式,在第1阶段中对其进行建模和合成;并在第2阶段对其进行建模和合成的非负连续形式。为两相合成过程提出了贝叶斯合成模型,并且可以易于实现其他合成模型。我们在《国家健康访谈调查》(NHIS)的样本中使用了应用程序。评估生成的合成数据集的效用和风险概况,并将其与单相合成过程的结果进行比较。

We propose a two-phase synthesis process for synthesizing income, a sensitive variable which is usually highly-skewed and has a number of reported zeros. We consider two forms of a continuous income variable: a binary form, which is modeled and synthesized in phase 1; and a non-negative continuous form, which is modeled and synthesized in phase 2. Bayesian synthesis models are proposed for the two-phase synthesis process, and other synthesis models are readily implementable. We demonstrate our methods with applications to a sample from the National Health Interview Survey (NHIS). Utility and risk profiles of generated synthetic datasets are evaluated and compared to results from a single-phase synthesis process.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源