论文标题
Quant-BNB:一种可扩展的分支和结合方法,用于具有连续特征的最佳决策树
Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees with Continuous Features
论文作者
论文摘要
决策树是机器学习工具箱中最有用,最受欢迎的方法之一。在本文中,我们考虑了学习最佳决策树的问题,这是一个组合优化问题,该问题具有挑战性地在大规模解决。文献中一种常见的方法是使用贪婪的启发式方法,这可能不是最佳的。最近,人们对使用各种方法(例如,基于整数编程,动态编程)学习最佳决策树的兴趣很大 - 为了实现计算可扩展性,这些方法中的大多数都集中在具有二进制功能的分类任务上。在本文中,我们提出了一种基于分支机构(BNB)的新离散优化方法,以获得最佳的决策树。与现有的定制方法不同,我们考虑具有连续功能的回归和分类任务。我们方法基础的基本思想是基于特征分布的分位数来拆分搜索空间 - 导致沿BNB迭代的基础优化问题的上限和下限。与现有的各种真实数据集上的浅层最佳树相比,我们提出的算法Quant-BNB显示出显着的加速。
Decision trees are one of the most useful and popular methods in the machine learning toolbox. In this paper, we consider the problem of learning optimal decision trees, a combinatorial optimization problem that is challenging to solve at scale. A common approach in the literature is to use greedy heuristics, which may not be optimal. Recently there has been significant interest in learning optimal decision trees using various approaches (e.g., based on integer programming, dynamic programming) -- to achieve computational scalability, most of these approaches focus on classification tasks with binary features. In this paper, we present a new discrete optimization method based on branch-and-bound (BnB) to obtain optimal decision trees. Different from existing customized approaches, we consider both regression and classification tasks with continuous features. The basic idea underlying our approach is to split the search space based on the quantiles of the feature distribution -- leading to upper and lower bounds for the underlying optimization problem along the BnB iterations. Our proposed algorithm Quant-BnB shows significant speedups compared to existing approaches for shallow optimal trees on various real datasets.