论文标题
涵盖解释和提升的规则
Rule Covering for Interpretation and Boosting
论文作者
论文摘要
我们提出了两种算法,用于解释和增强基于树的集合方法。两种算法都利用了数学编程模型,这些模型是由从决策树集合中提取的一组规则构建的。目的是获得最小的总杂质,其中涵盖所有样本的规则数量最少。第一种算法使用从训练有素的随机森林模型获得的决策树的集合。我们的数值结果表明,提出的规则涵盖方法仅选择一些可用于解释随机森林模型的规则。此外,由此产生的规则集与随机森林模型的准确性水平非常匹配。受线性编程中的列生成算法的启发,我们的第二算法使用规则生成方案来增强决策树。我们使用线性编程模型的双重最佳解决方案作为样品权重,仅获得那些可以提高准确性的规则。通过一项计算研究,我们观察到第二种算法通过其他众所周知的增强方法竞争性能。我们的实现还表明,这两种算法都可以与现有的随机森林和决策树包装相结合。
We propose two algorithms for interpretation and boosting of tree-based ensemble methods. Both algorithms make use of mathematical programming models that are constructed with a set of rules extracted from an ensemble of decision trees. The objective is to obtain the minimum total impurity with the least number of rules that cover all the samples. The first algorithm uses the collection of decision trees obtained from a trained random forest model. Our numerical results show that the proposed rule covering approach selects only a few rules that could be used for interpreting the random forest model. Moreover, the resulting set of rules closely matches the accuracy level of the random forest model. Inspired by the column generation algorithm in linear programming, our second algorithm uses a rule generation scheme for boosting decision trees. We use the dual optimal solutions of the linear programming models as sample weights to obtain only those rules that would improve the accuracy. With a computational study, we observe that our second algorithm performs competitively with the other well-known boosting methods. Our implementations also demonstrate that both algorithms can be trivially coupled with the existing random forest and decision tree packages.