论文标题

内核部分相关系数 - 有条件依赖性的度量

Kernel Partial Correlation Coefficient -- a Measure of Conditional Dependence

论文作者

Huang, Zhen, Deb, Nabarun, Sen, Bodhisattva

论文摘要

在本文中,我们提出并研究了一类简单的,非参数但可解释的措施在两个随机变量$ y $和$ z $之间的有条件依赖性的度量给定了第三个变量$ x $,所有这些都在一般拓扑空间中都采用值。这些措施中的任何一个的人口版本都捕获了有条件依赖的强度,并且只有$ y $和$ z $在$ x $的情况下是独立的。因此,我们称之为内核部分相关(KPC)系数的度量可以被认为是当$(x,y,z)$共同正常时具有上述属性的部分相关系数的非参数概括。我们描述了估计KPC的两种一致方法。我们的第一种方法利用了几何图的一般框架,包括$ k $ near的邻居图和最小跨越树。这些估计器的子类可以在几乎线性时间内计算,并以自动适应基础分布的内在维度的速率收敛。我们的第二个策略涉及使用繁殖内核希尔伯特空间中的跨互相算子直接估算条件平均嵌入。使用这些经验度量,我们开发逐步(高维)非线性变量选择算法。我们表明,使用基于图的估计器,即使在合适的稀疏假设下,即使在高维状态下,我们的算法也会产生一个证明是一致的无模型可变选择程序,即使在高维状态下,在高维状态下,随着样本量的指数增长。与现有程序相比,广泛的模拟和真实数据示例说明了我们方法的出色性能。 Azadkia和Chatterjee(2019)提出的最新条件依赖度量可以看作是我们一般框架的特殊情况。

In this paper we propose and study a class of simple, nonparametric, yet interpretable measures of conditional dependence between two random variables $Y$ and $Z$ given a third variable $X$, all taking values in general topological spaces. The population version of any of these measures captures the strength of conditional dependence and it is 0 if and only if $Y$ and $Z$ are conditionally independent given $X$, and 1 if and only if $Y$ is a measurable function of $Z$ and $X$. Thus, our measure -- which we call kernel partial correlation (KPC) coefficient -- can be thought of as a nonparametric generalization of the partial correlation coefficient that possesses the above properties when $(X,Y,Z)$ is jointly normal. We describe two consistent methods of estimating KPC. Our first method utilizes the general framework of geometric graphs, including $K$-nearest neighbor graphs and minimum spanning trees. A sub-class of these estimators can be computed in near linear time and converges at a rate that automatically adapts to the intrinsic dimension(s) of the underlying distribution(s). Our second strategy involves direct estimation of conditional mean embeddings using cross-covariance operators in the reproducing kernel Hilbert spaces. Using these empirical measures we develop forward stepwise (high-dimensional) nonlinear variable selection algorithms. We show that our algorithm, using the graph-based estimator, yields a provably consistent model-free variable selection procedure, even in the high-dimensional regime when the number of covariates grows exponentially with the sample size, under suitable sparsity assumptions. Extensive simulation and real-data examples illustrate the superior performance of our methods compared to existing procedures. The recent conditional dependence measure proposed by Azadkia and Chatterjee (2019) can be viewed as a special case of our general framework.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源