论文标题
并行化机器学习作为最终用户的服务
Parallelizing Machine Learning as a Service for the End-User
论文作者
论文摘要
随着ML应用程序变得越来越普遍,越来越多的训练的系统越来越多地向广泛的公众使用,使最终用户可以使用自己的数据提交查询,并有效地检索结果。随着此类服务越来越复杂,一个新的挑战是如何扩展到长期生长的用户群。在本文中,我们提出了一个分布式体系结构,该架构可以被利用以平行典型的ML系统管道。我们提出了一个由文本挖掘服务组成的案例研究,并讨论如何将该方法推广到许多类似的应用程序。我们通过广泛的实验评估来证明分布式体系结构增强了计算增益的重要性。
As ML applications are becoming ever more pervasive, fully-trained systems are made increasingly available to a wide public, allowing end-users to submit queries with their own data, and to efficiently retrieve results. With increasingly sophisticated such services, a new challenge is how to scale up to evergrowing user bases. In this paper, we present a distributed architecture that could be exploited to parallelize a typical ML system pipeline. We propose a case study consisting of a text mining service and discuss how the method can be generalized to many similar applications. We demonstrate the significance of the computational gain boosted by the distributed architecture by way of an extensive experimental evaluation.