自动化机器学习不断发展的数据的适应策略

论文标题

自动化机器学习不断发展的数据的适应策略

Adaptation Strategies for Automated Machine Learning on Evolving Data

论文作者

Celik, Bilge, Vanschoren, Joaquin

论文摘要

已显示自动化机器学习（AUTOML）系统可以有效地为新数据集构建良好的模型。但是，当数据随着时间的流逝而演变时，通常不清楚它们如何适应。这项研究的主要目的是了解数据流挑战的影响，例如概念漂移对汽车方法的性能，以及可以采用哪种适应策略来使其更强大。为此，我们提出了6种概念漂移适应策略，并评估了它们在不同的汽车方法上的有效性。我们为构建机器学习管道的各种汽车方法（包括利用贝叶斯优化，基因编程和自动堆叠的随机搜索）做到这一点。这些在具有不同类型的概念漂移的现实世界和合成数据流上进行了经验评估。基于此分析，我们提出了开发更复杂和强大的汽车技术的方法。

Automated Machine Learning (AutoML) systems have been shown to efficiently build good models for new datasets. However, it is often not clear how well they can adapt when the data evolves over time. The main goal of this study is to understand the effect of data stream challenges such as concept drift on the performance of AutoML methods, and which adaptation strategies can be employed to make them more robust. To that end, we propose 6 concept drift adaptation strategies and evaluate their effectiveness on different AutoML approaches. We do this for a variety of AutoML approaches for building machine learning pipelines, including those that leverage Bayesian optimization, genetic programming, and random search with automated stacking. These are evaluated empirically on real-world and synthetic data streams with different types of concept drift. Based on this analysis, we propose ways to develop more sophisticated and robust AutoML techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题