迄今为止及以后的汽车：挑战和机遇

论文标题

迄今为止及以后的汽车：挑战和机遇

AutoML to Date and Beyond: Challenges and Opportunities

论文作者

Santu, Shubhra Kanti Karmaker, Hassan, Md. Mahadi, Smith, Micah J., Xu, Lei, Zhai, ChengXiang, Veeramachaneni, Kalyan

论文摘要

随着大数据在整个领域变得无处不在，越来越多的利益相关者渴望充分利用其数据，对机器学习工具的需求促使研究人员探索自动机器学习（AUTOML）的可能性。 Automl工具旨在使机器学习可用于非机器学习专家（领域专家），以提高机器学习的效率并加速机器学习研究。但是，尽管自动化和效率是Automl的主要卖点之一，但该过程仍然需要在许多重要步骤中进行人类参与，包括了解特定于领域的数据的属性，定义预测问题，创建合适的培训数据集，并选择有前途的机器学习技术。这些步骤通常需要延长的来回，这使得该过程对领域专家和数据科学家的效率都无效，并且可以防止所谓的自动系统真正自动。在这篇评论文章中，我们使用七层示意图介绍了一种新的汽车系统分类系统，以根据其自治水平来区分这些系统。我们首先描述了端到端的机器学习管道的实际外观，以及到目前为止，机器学习管道的哪些子任务已经自动化。我们强调了那些仍在手动完成的子任务（通常是由数据科学家）完成的，并解释了这如何限制域专家对机器学习的访问。接下来，我们介绍针对汽车系统的新型分类法，并根据提供的自动化支持范围来定义每个级别。最后，我们为未来建立了路线图，并指出了进一步自动化端到端机器学习管道所需的研究，并讨论了以这一雄心勃勃的目标为目标的重要挑战。

As big data becomes ubiquitous across domains, and more and more stakeholders aspire to make the most of their data, demand for machine learning tools has spurred researchers to explore the possibilities of automated machine learning (AutoML). AutoML tools aim to make machine learning accessible for non-machine learning experts (domain experts), to improve the efficiency of machine learning, and to accelerate machine learning research. But although automation and efficiency are among AutoML's main selling points, the process still requires human involvement at a number of vital steps, including understanding the attributes of domain-specific data, defining prediction problems, creating a suitable training data set, and selecting a promising machine learning technique. These steps often require a prolonged back-and-forth that makes this process inefficient for domain experts and data scientists alike, and keeps so-called AutoML systems from being truly automatic. In this review article, we introduce a new classification system for AutoML systems, using a seven-tiered schematic to distinguish these systems based on their level of autonomy. We begin by describing what an end-to-end machine learning pipeline actually looks like, and which subtasks of the machine learning pipeline have been automated so far. We highlight those subtasks which are still done manually - generally by a data scientist - and explain how this limits domain experts' access to machine learning. Next, we introduce our novel level-based taxonomy for AutoML systems and define each level according to the scope of automation support provided. Finally, we lay out a roadmap for the future, pinpointing the research required to further automate the end-to-end machine learning pipeline and discussing important challenges that stand in the way of this ambitious goal.

下载PDF全文

下载文献需遵守相关版权规定

论文标题