论文标题

基于决策树的合奏的非参数测试

A Nonparametric Test of Dependence Based on Ensemble of Decision Trees

论文作者

Mahdi, Rami

论文摘要

在本文中,提出了两个随机变量之间的稳定的非参数衡量统计依赖性或相关性。所提出的系数是一个类似置换的统计量,它量化了观察到的样品S_N的数量:{(x_i,y_i),i = 1。 。 。 n}可以与排列的样本 ^s_nn:{(x_i,y_j),i,j = 1。 。 。 n},两个变量是独立的。使用可互换的,可互换的放出样本的预测来确定可区分性的程度,从训练决策树的总体中,以区分两个样本而不实现置换样品。所提出的系数在计算上是有效的,可解释的,对于单调转换,并且在独立性下具有良好的分布。经验结果表明,提出的方法是具有从嘈杂数据中检测复杂关系的高功率。

In this paper, a robust non-parametric measure of statistical dependence, or correlation, between two random variables is presented. The proposed coefficient is a permutation-like statistic that quantifies how much the observed sample S_n : {(X_i , Y_i), i = 1 . . . n} is discriminable from the permutated sample ^S_nn : {(X_i , Y_j), i, j = 1 . . . n}, where the two variables are independent. The extent of discriminability is determined using the predictions for the, interchangeable, leave-out sample from training an aggregate of decision trees to discriminate between the two samples without materializing the permutated sample. The proposed coefficient is computationally efficient, interpretable, invariant to monotonic transformations, and has a well-approximated distribution under independence. Empirical results show the proposed method to have a high power for detecting complex relationships from noisy data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源