操作机器学习：一项访谈研究

论文标题

操作机器学习：一项访谈研究

Operationalizing Machine Learning: An Interview Study

论文作者

Shankar, Shreya, Garcia, Rolando, Hellerstein, Joseph M., Parameswaran, Aditya G.

论文摘要

组织依靠机器学习工程师（MLE）来操作ML，即部署和维护生产中的ML管道。操作ML或MLOP的过程包括（i）数据收集和标记的连续循环，（ii）实验以改善ML性能，（III）在多阶段部署过程中进行评估，以及（iv）生产绩效下降的过程。当一起考虑时，这些责任似乎令人震惊 - 任何人如何进行MLOP，什么是未解决的挑战，对工具制造商的影响是什么？我们对包括聊天机器人，自动驾驶汽车和金融在内的许多应用程序进行的18个MLE进行了半结构化的民族志访谈。我们的访谈暴露了三个变量，这些变量控制了生产ML部署的成功：速度，验证和版本。我们总结了成功的ML实验，部署和维持生产绩效的共同实践。最后，我们讨论了受访者的痛点和反诉说，并对工具设计产生了影响。

Organizations rely on machine learning engineers (MLEs) to operationalize ML, i.e., deploy and maintain ML pipelines in production. The process of operationalizing ML, or MLOps, consists of a continual loop of (i) data collection and labeling, (ii) experimentation to improve ML performance, (iii) evaluation throughout a multi-staged deployment process, and (iv) monitoring of performance drops in production. When considered together, these responsibilities seem staggering -- how does anyone do MLOps, what are the unaddressed challenges, and what are the implications for tool builders? We conducted semi-structured ethnographic interviews with 18 MLEs working across many applications, including chatbots, autonomous vehicles, and finance. Our interviews expose three variables that govern success for a production ML deployment: Velocity, Validation, and Versioning. We summarize common practices for successful ML experimentation, deployment, and sustaining production performance. Finally, we discuss interviewees' pain points and anti-patterns, with implications for tool design.

下载PDF全文

下载文献需遵守相关版权规定

论文标题