研究在Docker上部署机器学习项目的实践

论文标题

研究在Docker上部署机器学习项目的实践

Studying the Practices of Deploying Machine Learning Projects on Docker

论文作者

Openja, Moses, Majidi, Forough, Khomh, Foutse, Chembakottu, Bhagya, Li, Heng

论文摘要

Docker是一种容器化服务，可方便地部署网站，数据库，应用程序的API和机器学习（ML）模型，并具有几行代码。研究最近探索了Docker在部署通用软件项目的使用，而没有具体关注Docker如何用于部署基于ML的项目。在这项研究中，我们进行了一项探索性研究，以了解如何使用Docker来部署基于ML的项目。作为第一步，我们检查了使用Docker的基于ML的项目的类别。然后，我们研究了这些项目的原因以及如何使用Docker的原因以及所得Docker图像的特征。我们的结果表明，基于ML的六类项目使用Docker进行部署，包括ML应用程序，MLOPS/ AIOPS，工具包，DL框架，模型和文档。我们得出了代表使用Docker的21个主要类别的分类学，包括特定于模型管理任务（例如测试，培训）等模型的分类学。然后，我们表明ML工程师主要使用Docker图像来帮助平台可移植性，例如在操作系统中传输软件，例如GPU和语言限制。但是，我们还发现，由于具有深嵌套目录的图像层中包含的大量文件，可能需要更多资源来运行用于构建ML基于ML的软件项目的Docker图像。我们希望阐明使用容器和突出应改进的方面部署ML软件项目的新兴实践。

Docker is a containerization service that allows for convenient deployment of websites, databases, applications' APIs, and machine learning (ML) models with a few lines of code. Studies have recently explored the use of Docker for deploying general software projects with no specific focus on how Docker is used to deploy ML-based projects. In this study, we conducted an exploratory study to understand how Docker is being used to deploy ML-based projects. As the initial step, we examined the categories of ML-based projects that use Docker. We then examined why and how these projects use Docker, and the characteristics of the resulting Docker images. Our results indicate that six categories of ML-based projects use Docker for deployment, including ML Applications, MLOps/ AIOps, Toolkits, DL Frameworks, Models, and Documentation. We derived the taxonomy of 21 major categories representing the purposes of using Docker, including those specific to models such as model management tasks (e.g., testing, training). We then showed that ML engineers use Docker images mostly to help with the platform portability, such as transferring the software across the operating systems, runtimes such as GPU, and language constraints. However, we also found that more resources may be required to run the Docker images for building ML-based software projects due to the large number of files contained in the image layers with deeply nested directories. We hope to shed light on the emerging practices of deploying ML software projects using containers and highlight aspects that should be improved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题