论文标题
不要将其视为理所当然:比较开源库中的软件开发工作估算
Do Not Take It for Granted: Comparing Open-Source Libraries for Software Development Effort Estimation
论文作者
论文摘要
在过去的二十年中,几个机器学习(ML)库已自由使用。许多研究都使用此类库来对预测软件工程(SE)任务进行实证研究。但是,使用一个库在另一个库上使用的差异被忽略了,隐含地假设使用这些库中的任何一个都会为用户提供相同或非常相似的结果。本文旨在提高人们对使用不同ML库进行软件开发工作估算(见)时所产生的差异的认识,这是研究最广泛的SE预测任务之一。为此,我们研究了3个最受欢迎的ML开源库提供的4个确定性机器学习者(即不同语言(即Scikit-Learn,Caret和Weka))。我们进行了一项彻底的经验研究,比较了机器学习者在5个最常见的数据集上的性能,请参见两个最常见的方案(即,盒子内ML和TUNED-ML),以及对其API的文档和代码的深入分析。我们的研究结果表明,在总共研究的105例病例中,这3个文库提供的预测平均为95%。在大多数情况下,这些差异明显很大,并且误容最多。每个项目3,000小时。此外,我们的API分析表明,这些库为用户提供了可以操纵参数的不同级别的控制,并且总体上缺乏清晰度和一致性,这可能会误导用户。我们的发现强调,ML库是参见研究的重要设计选择,这可能会导致性能差异。但是,这样的差异不足。最后,我们通过强调开放式挑战,对图书馆的开发人员以及使用它们的研究人员和从业者提出建议。
In the past two decades, several Machine Learning (ML) libraries have become freely available. Many studies have used such libraries to carry out empirical investigations on predictive Software Engineering (SE) tasks. However, the differences stemming from using one library over another have been overlooked, implicitly assuming that using any of these libraries would provide the user with the same or very similar results. This paper aims at raising awareness of the differences incurred when using different ML libraries for software development effort estimation (SEE), one of most widely studied SE prediction tasks. To this end, we investigate 4 deterministic machine learners as provided by 3 of the most popular ML open-source libraries written in different languages (namely, Scikit-Learn, Caret and Weka). We carry out a thorough empirical study comparing the performance of the machine learners on 5 SEE datasets in the two most common SEE scenarios (i.e., out-of-the-box-ml and tuned-ml) as well as an in-depth analysis of the documentation and code of their APIs. The results of our study reveal that the predictions provided by the 3 libraries differ in 95% of the cases on average across a total of 105 cases studied. These differences are significantly large in most cases and yield misestimations of up to approx. 3,000 hours per project. Moreover, our API analysis reveals that these libraries provide the user with different levels of control on the parameters one can manipulate, and a lack of clarity and consistency, overall, which might mislead users. Our findings highlight that the ML library is an important design choice for SEE studies, which can lead to a difference in performance. However, such a difference is under-documented. We conclude by highlighting open-challenges with suggestions for the developers of libraries as well as for the researchers and practitioners using them.