论文标题
零充气数据的定向图形模型和因果发现
Directed Graphical Models and Causal Discovery for Zero-Inflated Data
论文作者
论文摘要
现代的RNA测序技术提供了来自单个细胞的基因表达测量结果,该测量有望完善基因调节关系的见解。定向图形模型非常适合探索(原因效应)关系。但是,通过数据通常显示零泄漏的表达模式的事实使单细胞数据的统计分析变得复杂。为了应对这一挑战,我们提出了基于父级变量中的多项式参数及其0/1指标的有条件分布的定向图形模型。虽然高斯模型的有向图通常可识别为等效类,但我们表明,在自然和弱的假设下,可以识别零膨胀模型的确切的定向无环图。我们提出了用于图形恢复的方法,将模型应用于T辅助细胞上的真实单细胞RNA-seq数据,并显示模拟实验,以验证实践中验证可识别性和图形估计方法。
Modern RNA sequencing technologies provide gene expression measurements from single cells that promise refined insights on regulatory relationships among genes. Directed graphical models are well-suited to explore such (cause-effect) relationships. However, statistical analyses of single cell data are complicated by the fact that the data often show zero-inflated expression patterns. To address this challenge, we propose directed graphical models that are based on Hurdle conditional distributions parametrized in terms of polynomials in parent variables and their 0/1 indicators of being zero or nonzero. While directed graphs for Gaussian models are only identifiable up to an equivalence class in general, we show that, under a natural and weak assumption, the exact directed acyclic graph of our zero-inflated models can be identified. We propose methods for graph recovery, apply our model to real single-cell RNA-seq data on T helper cells, and show simulated experiments that validate the identifiability and graph estimation methods in practice.