论文标题
预训练也有助于贝叶斯优化
Pre-training helps Bayesian optimization too
论文作者
论文摘要
贝叶斯优化(BO)已成为许多昂贵现实世界功能的全球优化的流行策略。与普遍认为BO适合优化黑框功能的信念相反,它实际上需要有关这些功能特征的域知识才能成功部署BO。这种领域知识通常在高斯过程中表现出来,这些先验指定了有关功能的初始信念。但是,即使有专家知识,选择先验也不是一件容易的事。对于复杂的机器学习模型上的超参数调谐问题尤其如此,在这种模型中,调整目标的景观通常很难理解。我们寻求一种设定这些功能先验的替代实践。特别是,我们考虑了从相似函数中获得数据的方案,使我们可以先验地进行更严格的分布。为了验证我们在现实的模型培训设置中的方法,我们通过在流行图像和文本数据集上训练数以万计的近状态模型配置以及蛋白质序列数据集来收集大型多任务超参数调谐数据集。我们的结果表明,平均而言,我们的方法能够比最佳竞争方法高效地定位良好的超参数。
Bayesian optimization (BO) has become a popular strategy for global optimization of many expensive real-world functions. Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process priors that specify initial beliefs on functions. However, even with expert knowledge, it is not an easy task to select a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. To verify our approach in realistic model training setups, we collected a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, our method is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods.