最佳随机分类树中的稀疏性

论文标题

最佳随机分类树中的稀疏性

Sparsity in Optimal Randomized Classification Trees

论文作者

Blanquero, Rafael, Carrizosa, Emilio, Molero-Río, Cristina, Morales, Dolores Romero

论文摘要

决策树是流行的分类和回归工具，当小规模时，易于解释。传统上，一种贪婪的方法已被用来建造树木，从而产生了非常快速的训练过程。但是，控制稀疏性（可解释性的替代性）是具有挑战性的。在最近的研究中，最佳决策树（同时优化所有决策）表现出更好的学习表现，尤其是在实施倾斜的情况下。在本文中，我们提出了一种基于倾斜切割的稀有最佳分类树的连续优化方法，目的是在切割和整个树上使用更少的预测变量。两种类型的稀疏性，即局部和全局，都是通过使用多面体规范的正规化来建模的。报告的计算经验支持我们方法的有用性。在我们的所有数据集中，可以改善本地和全球稀疏性，而不会损害分类准确性。 Unlike greedy approaches, our ability to easily trade in some of our classification accuracy for a gain in global sparsity is shown.

Decision trees are popular Classification and Regression tools and, when small-sized, easy to interpret. Traditionally, a greedy approach has been used to build the trees, yielding a very fast training process; however, controlling sparsity (a proxy for interpretability) is challenging. In recent studies, optimal decision trees, where all decisions are optimized simultaneously, have shown a better learning performance, especially when oblique cuts are implemented. In this paper, we propose a continuous optimization approach to build sparse optimal classification trees, based on oblique cuts, with the aim of using fewer predictor variables in the cuts as well as along the whole tree. Both types of sparsity, namely local and global, are modeled by means of regularizations with polyhedral norms. The computational experience reported supports the usefulness of our methodology. In all our data sets, local and global sparsity can be improved without harming classification accuracy. Unlike greedy approaches, our ability to easily trade in some of our classification accuracy for a gain in global sparsity is shown.

下载PDF全文

下载文献需遵守相关版权规定

论文标题