论文标题
回归模型的最佳加权合奏:精确的权重优化和应用
Optimally Weighted Ensembles of Regression Models: Exact Weight Optimization and Applications
论文作者
论文摘要
通常向用户提出自动化模型选择,以选择用于应用给定回归任务的机器学习模型(或方法)。在本文中,我们表明,组合不同的回归模型比选择单个(“最佳”)回归模型可以产生更好的结果,并概述了一种有效的方法,该方法从异质性回归模型集中获得了最佳加权凸线线性组合。更具体地说,在本文中,在上一篇论文中使用的启发式权重优化被使用凸二次编程的精确优化算法取代。我们证明了直接配方的二次编程公式和具有加权数据点的配方的凸度。新颖的重量优化不仅(更)精确,而且更有效。我们在本文中开发的方法是通过github-open来源实施和提供的。它们可以在常见的硬件上执行,并提供透明且易于解释的接口。结果表明,该方法在一系列数据集上的表现优于模型选择方法,包括来自药物发现应用程序的混合变量类型的数据集。
Automated model selection is often proposed to users to choose which machine learning model (or method) to apply to a given regression task. In this paper, we show that combining different regression models can yield better results than selecting a single ('best') regression model, and outline an efficient method that obtains optimally weighted convex linear combination from a heterogeneous set of regression models. More specifically, in this paper, a heuristic weight optimization, used in a preceding conference paper, is replaced by an exact optimization algorithm using convex quadratic programming. We prove convexity of the quadratic programming formulation for the straightforward formulation and for a formulation with weighted data points. The novel weight optimization is not only (more) exact but also more efficient. The methods we develop in this paper are implemented and made available via github-open source. They can be executed on commonly available hardware and offer a transparent and easy to interpret interface. The results indicate that the approach outperforms model selection methods on a range of data sets, including data sets with mixed variable type from drug discovery applications.