论文标题
与分布式数据共享管理的协作因果推断
Collaborative causal inference with a distributed data-sharing management
论文作者
论文摘要
数据共享障碍是由多中心临床试验引起的最高挑战,在这些试验中,多个数据源以分布式方式存储在不同的本地研究地点。将此类数据源合并到共同的数据存储中进行集中统计分析需要数据使用协议,这通常很耗时。当因果推断具有主要兴趣时,数据合并可能会变得更加繁重,因为倾向得分建模涉及将许多混杂变量组合在一起,并且在文献中尚未对这种附加建模的系统融合在荟萃分析中。我们提出了一个新的因果推理框架,该框架避免了来自多个站点的主题级原始数据的合并,但只需要共享摘要统计信息。拟议的协作推论享有对数据隐私的最大保护,并且对跨数据源不平衡的数据分布的敏感性最小。我们从理论和数字上表明,与需要合并整个数据的集中式方法相比,新的分布式因果推理方法几乎没有统计能力的损失。我们为提出的方法提供了大样本特性和算法。我们通过仿真实验和一个现实世界数据示例来说明其性能,以降低肾移植患者中移植后糖尿病的风险的多中心临床试验。
Data sharing barriers are paramount challenges arising from multicenter clinical trials where multiple data sources are stored in a distributed fashion at different local study sites. Merging such data sources into a common data storage for a centralized statistical analysis requires a data use agreement, which is often time-consuming. Data merging may become more burdensome when causal inference is of primary interest because propensity score modeling involves combining many confounding variables, and systematic incorporation of this additional modeling in meta-analysis has not been thoroughly investigated in the literature. We propose a new causal inference framework that avoids the merging of subject-level raw data from multiple sites but needs only the sharing of summary statistics. The proposed collaborative inference enjoys maximal protection of data privacy and minimal sensitivity to unbalanced data distributions across data sources. We show theoretically and numerically that the new distributed causal inference approach has little loss of statistical power compared to the centralized method that requires merging the entire data. We present large-sample properties and algorithms for the proposed method. We illustrate its performance by simulation experiments and a real-world data example on a multicenter clinical trial of basal insulin treatment for reducing the risk of post-transplantation diabetes among kidney-transplant patients.