论文标题

操作机器学习:一项访谈研究

Operationalizing Machine Learning: An Interview Study

论文作者

Shankar, Shreya, Garcia, Rolando, Hellerstein, Joseph M., Parameswaran, Aditya G.

论文摘要

组织依靠机器学习工程师(MLE)来操作ML,即部署和维护生产中的ML管道。操作ML或MLOP的过程包括(i)数据收集和标记的连续循环,(ii)实验以改善ML性能,(III)在多阶段部署过程中进行评估,以及(iv)生产绩效下降的过程。当一起考虑时,这些责任似乎令人震惊 - 任何人如何进行MLOP,什么是未解决的挑战,对工具制造商的影响是什么? 我们对包括聊天机器人,自动驾驶汽车和金融在内的许多应用程序进行的18个MLE进行了半结构化的民族志访谈。我们的访谈暴露了三个变量,这些变量控制了生产ML部署的成功:速度,验证和版本。我们总结了成功的ML实验,部署和维持生产绩效的共同实践。最后,我们讨论了受访者的痛点和反诉说,并对工具设计产生了影响。

Organizations rely on machine learning engineers (MLEs) to operationalize ML, i.e., deploy and maintain ML pipelines in production. The process of operationalizing ML, or MLOps, consists of a continual loop of (i) data collection and labeling, (ii) experimentation to improve ML performance, (iii) evaluation throughout a multi-staged deployment process, and (iv) monitoring of performance drops in production. When considered together, these responsibilities seem staggering -- how does anyone do MLOps, what are the unaddressed challenges, and what are the implications for tool builders? We conducted semi-structured ethnographic interviews with 18 MLEs working across many applications, including chatbots, autonomous vehicles, and finance. Our interviews expose three variables that govern success for a production ML deployment: Velocity, Validation, and Versioning. We summarize common practices for successful ML experimentation, deployment, and sustaining production performance. Finally, we discuss interviewees' pain points and anti-patterns, with implications for tool design.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源