论文标题

袋子是修剪

To Bag is to Prune

论文作者

Coulombe, Philippe Goulet

论文摘要

众所周知,建造一个不良的随机森林(RF)很难。同时,射频公然过度放置样本,而没有任何明显的后果。标准论点,例如经典的偏见变化权衡或双重下降,无法合理化这一悖论。我提出了一个新的解释:RF自动实现的Bootstrap聚合和模型扰动会将潜在的“ True”树修剪。更一般而言,贪婪优化的学习者的随机合奏隐含地执行最佳的早期停止样本外。因此,无需调整停止点。通过构造,新颖的增强和火星变体也有资格自动调整。我通过报告这些新的完全拟合的合奏的性能与他们的调谐对应物相似,或者更好。

It is notoriously difficult to build a bad Random Forest (RF). Concurrently, RF blatantly overfits in-sample without any apparent consequence out-of-sample. Standard arguments, like the classic bias-variance trade-off or double descent, cannot rationalize this paradox. I propose a new explanation: bootstrap aggregation and model perturbation as implemented by RF automatically prune a latent "true" tree. More generally, randomized ensembles of greedily optimized learners implicitly perform optimal early stopping out-of-sample. So there is no need to tune the stopping point. By construction, novel variants of Boosting and MARS are also eligible for automatic tuning. I empirically demonstrate the property, with simulated and real data, by reporting that these new completely overfitting ensembles perform similarly to their tuned counterparts -- or better.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源