论文标题
贝叶斯因果关系与两分记录联系
Bayesian Causal Inference with Bipartite Record Linkage
论文作者
论文摘要
在许多情况下,因果推断所需的观察数据分布在两个数据文件上。特别是,我们考虑一个情况,其中一个文件包括协变量和在一组个人上测量的处理,第二个文件包括在另一个个人部分重叠的一个个人上测量的响应。在没有错误的无直接标识符(例如社会安全号码)之类的情况下,单独文件的直接合并是不可行的,因此必须使用诸如名称,出生日期和人口统计学特征之类的易于错误的变量链接记录。在这种情况下,典型的做法通常遵循一个两个阶段的过程:首先使用概率链接技术链接两个文件,然后使用链接的数据集进行因果推断。由于与因果推断的不完全联系,这并不能传播不确定性,也不会利用研究变量之间的关系来提高联系的质量。我们提出了一个分层模型,以同时对解决这些缺陷的概率联系和因果关系效应进行贝叶斯的推断。使用仿真研究和理论论据,我们显示分层模型可以提高估计的治疗效果的准确性以及与两阶段建模选项相比。我们使用因果研究借记卡对家庭支出的影响进行了分层模型。
In many scenarios, the observational data needed for causal inferences are spread over two data files. In particular, we consider scenarios where one file includes covariates and the treatment measured on one set of individuals, and a second file includes responses measured on another, partially overlapping set of individuals. In the absence of error free direct identifiers like social security numbers, straightforward merging of separate files is not feasible, so that records must be linked using error-prone variables such as names, birth dates, and demographic characteristics. Typical practice in such situations generally follows a two-stage procedure: first link the two files using a probabilistic linkage technique, then make causal inferences with the linked dataset. This does not propagate uncertainty due to imperfect linkages to the causal inference, nor does it leverage relationships among the study variables to improve the quality of the linkages. We propose a hierarchical model for simultaneous Bayesian inference on probabilistic linkage and causal effects that addresses these deficiencies. Using simulation studies and theoretical arguments, we show the hierarchical model can improve the accuracy of estimated treatment effects, as well as the record linkages, compared to the two-stage modeling option. We illustrate the hierarchical model using a causal study of the effects of debit card possession on household spending.