论文标题
敏捷的努力估计:我们解决了问题吗?第二次复制研究的见解(GPT2SP复制报告)
Agile Effort Estimation: Have We Solved the Problem Yet? Insights From A Second Replication Study (GPT2SP Replication Report)
论文作者
论文摘要
FU和Tantithamthavorn最近提出了GPT2SP,这是一种基于变压器的深度学习模型,用于估算用户故事。他们经验评估了Choetkiertikul等人共享的数据集上GPT2SP的性能,其中包括16个项目,总共有23,313期。它们将GPT2SP基准针对两个基准(即天真平均值和中值估计量)以及Choetkiertikul等人先前提出的方法。 (从现在开始,我们将称为DL2SP)对于内部和跨项目估计方案,并评估GPT2SP的每个组件在多大程度上有助于SP估计值的准确性。他们的结果表明,GPT2SP的表现优于MAE的DL2SP比MAE在项目内方面的提高了6%-47%,而跨项目方案提高了3%-46%。但是,当我们尝试使用FU和Tantithamthavorn提供的GPT2SP源代码来复制他们的实验时,我们发现在计算平均绝对误差(MAE)的计算中,这可能会使GPT2SP在其工作中报告的准确性膨胀。因此,我们发出了一个拉动请求来修复此类错误,该错误已被接受并在其存储库中合并到https://github.com/awsm-research/gpt2sp/pull/2。 在本报告中,我们描述了通过使用固定版本的GPT2SP来复制RQ1和RQ2原始论文中进行的实验的结果。在最初的研究之后,我们分析了每个项目中所有问题的估计方法的MEDAN绝对误差(MAE)的结果,但是我们还报告了中值绝对误差(MDAE)和标准准确性(SA)以确保完整性。
Fu and Tantithamthavorn have recently proposed GPT2SP, a Transformer-based deep learning model for SP estimation of user stories. They empirically evaluated the performance of GPT2SP on a dataset shared by Choetkiertikul et al including 16 projects with a total of 23,313 issues. They benchmarked GPT2SP against two baselines (namely the naive Mean and Median estimators) and the method previously proposed by Choetkiertikul et al. (which we will refer to as DL2SP from now on) for both within- and cross-project estimation scenarios, and evaluated the extent to which each components of GPT2SP contribute towards the accuracy of the SP estimates. Their results show that GPT2SP outperforms DL2SP with a 6%-47% improvement over MAE for the within-project scenario and a 3%-46% improvement for the cross-project scenarios. However, when we attempted to use the GPT2SP source code made available by Fu and Tantithamthavorn to reproduce their experiments, we found a bug in the computation of the Mean Absolute Error (MAE), which may have inflated the GPT2SP's accuracy reported in their work. Therefore, we had issued a pull request to fix such a bug, which has been accepted and merged into their repository at https://github.com/awsm-research/gpt2sp/pull/2. In this report, we describe the results we achieved by using the fixed version of GPT2SP to replicate the experiments conducted in the original paper for RQ1 and RQ2. Following the original study, we analyse the results considering the Medan Absolute Error (MAE) of the estimation methods over all issues in each project, but we also report the Median Absolute Error (MdAE) and the Standard accuracy (SA) for completeness.