论文标题

关于无限级U统计量的随机森林的差异估计

On Variance Estimation of Random Forests with Infinite-Order U-statistics

论文作者

Xu, Tianning, Zhu, Ruoqing, Shao, Xiaofeng

论文摘要

无限级U统计量(IOUS)已广泛用于集合学习算法,例如随机森林,以量化其不确定性。尽管已经对IOU的正态结果进行了广泛的研究,但其方差估计方法和理论属性仍未开发。现有的方法主要利用Hoffding分解中的主要术语优势属性。但是,当内核大小较大或样本量较小时,这种观点通常会导致估计的偏差。另一方面,尽管文献中存在几个公正的估计量,但从未研究过它们的关系和理论特性,尤其是比率一致性。这些局限性导致构建置信区间的表现无保证。为了弥合文献中的这些差距,我们提出了一种新的观点,以实现Hoffding分解,以实现导致无偏估计器的方差估计。我们的观点不是领先的统治地位,而是利用了峰区域的主导地位。此外,我们与几个现有的无偏差估计器建立了估计器的连接和等效性。从理论上讲,我们是第一个建立这种方差估计器的比率一致性的人,这证明了由随机森林构建的置信区间的覆盖率。从数字上讲,我们进一步提出了一个局部平滑程序,以改善估计器的有限样本性能。广泛的仿真研究表明,我们的估计器的偏见和档案针对性的覆盖率较低。

Infinite-order U-statistics (IOUS) has been used extensively on subbagging ensemble learning algorithms such as random forests to quantify its uncertainty. While normality results of IOUS have been studied extensively, its variance estimation approaches and theoretical properties remain mostly unexplored. Existing approaches mainly utilize the leading term dominance property in the Hoeffding decomposition. However, such a view usually leads to biased estimation when the kernel size is large or the sample size is small. On the other hand, while several unbiased estimators exist in the literature, their relationships and theoretical properties, especially the ratio consistency, have never been studied. These limitations lead to unguaranteed performances of constructed confidence intervals. To bridge these gaps in the literature, we propose a new view of the Hoeffding decomposition for variance estimation that leads to an unbiased estimator. Instead of leading term dominance, our view utilizes the dominance of the peak region. Moreover, we establish the connection and equivalence of our estimator with several existing unbiased variance estimators. Theoretically, we are the first to establish the ratio consistency of such a variance estimator, which justifies the coverage rate of confidence intervals constructed from random forests. Numerically, we further propose a local smoothing procedure to improve the estimator's finite sample performance. Extensive simulation studies show that our estimators enjoy lower bias and archive targeted coverage rates.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源