随机种植森林：直接解释的树合奏

论文标题

随机种植森林：直接解释的树合奏

Random Planted Forest: a directly interpretable tree ensemble

论文作者

Hiabu, Munir, Mammen, Enno, Meyer, Joseph T.

论文摘要

我们介绍了一种基于可解释的树的新型算法，用于回归环境中的预测。我们的动机是从功能分解的角度估算未知的回归函数，在该角度，功能组件对应于较低的相互作用项。这个想法是通过将某些叶子分开而不是删除这些叶子来修改随机森林算法。这导致了非二元树，我们称之为种植树。森林的延伸导致我们随机种植的森林算法。另外，可以在叶片中相互作用的最大协变量数量可以界定。如果我们将这种相互作用设置为绑定到一个，则结果估计值是一维函数的总和。在另一个极端情况下，如果我们不设置限制，则将结果的估计器和相应模型对回归函数的形式不限制。在一项模拟研究中，我们发现了我们随机种植森林方法的预测和可视化特性。在相互作用结合较低的情况下，我们还为随机种植森林的理想化版本开发了理论。我们表明，如果它小于三，则理想化的版本可以渐近地达到最佳收敛速率，直至对数因子。代码可在github https://github.com/plantedml/randastantantedforest上找到。

We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to modify the random forest algorithm by keeping certain leaves after they are split instead of deleting them. This leads to non-binary trees which we refer to as planted trees. An extension to a forest leads to our random planted forest algorithm. Additionally, the maximum number of covariates which can interact within a leaf can be bounded. If we set this interaction bound to one, the resulting estimator is a sum of one-dimensional functions. In the other extreme case, if we do not set a limit, the resulting estimator and corresponding model place no restrictions on the form of the regression function. In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method. We also develop theory for an idealized version of random planted forests in cases where the interaction bound is low. We show that if it is smaller than three, the idealized version achieves asymptotically optimal convergence rates up to a logarithmic factor. Code is available on GitHub https://github.com/PlantedML/randomPlantedForest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题