软件工程在机器学习中的最佳实践的采用和影响

论文标题

软件工程在机器学习中的最佳实践的采用和影响

Adoption and Effects of Software Engineering Best Practices in Machine Learning

论文作者

Serban, Alex, van der Blom, Koen, Hoos, Holger, Visser, Joost

论文摘要

对机器学习（ML）组件的应用越来越依赖，要求采用成熟的工程技术，以确保这些技术以强大而耐心的方式构建。我们的目标是通过经验确定团队如何使用ML组件开发，部署和维护软件的最新技术。我们挖掘了学术和灰色文献，并确定了29种用于ML应用的工程最佳实践。我们对313名从业人员进行了调查，以确定这些实践的采用程度并验证其感知的影响。使用调查回答，我们量化了采用练习，从人口统计学特征（例如地理或团队规模）进行了区分。我们还测试了相关性，并使用了各种统计模型研究了实践与其感知效果之间的线性和非线性关系。例如，我们的发现表明，较大的团队倾向于采用更多的实践，而传统的软件工程实践往往比ML特定的实践较低。此外，统计模型可以准确地预测从特定实践集的采用程度来预测敏捷性，软件质量和可追溯性。正如统计模型所揭示的那样，将实践采用率与实践重要性相结合，我们确定了重要但采用较低的实践，以及广泛采用但对我们研究的影响不太重要的实践。总体而言，我们的调查和收到的回答分析为评估和逐步改善ML团队的实践采用提供了定量基础。

The increasing reliance on applications with machine learning (ML) components calls for mature engineering techniques that ensure these are built in a robust and future-proof manner. We aim to empirically determine the state of the art in how teams develop, deploy and maintain software with ML components. We mined both academic and grey literature and identified 29 engineering best practices for ML applications. We conducted a survey among 313 practitioners to determine the degree of adoption for these practices and to validate their perceived effects. Using the survey responses, we quantified practice adoption, differentiated along demographic characteristics, such as geography or team size. We also tested correlations and investigated linear and non-linear relationships between practices and their perceived effect using various statistical models. Our findings indicate, for example, that larger teams tend to adopt more practices, and that traditional software engineering practices tend to have lower adoption than ML specific practices. Also, the statistical models can accurately predict perceived effects such as agility, software quality and traceability, from the degree of adoption for specific sets of practices. Combining practice adoption rates with practice importance, as revealed by statistical models, we identify practices that are important but have low adoption, as well as practices that are widely adopted but are less important for the effects we studied. Overall, our survey and the analysis of responses received provide a quantitative basis for assessment and step-wise improvement of practice adoption by ML teams.

下载PDF全文

下载文献需遵守相关版权规定

论文标题