论文标题
使用源代码密度来提高自动提交分类的准确性到维护活动
Using Source Code Density to Improve the Accuracy of Automatic Commit Classification into Maintenance Activities
论文作者
论文摘要
更改源代码的原因,例如,以适应,更正或调整它。这个原因可以提供对开发过程的宝贵见解,但在更改将其投入到源代码存储库时很少有明确的记录。自动提交分类使用从提交提取的功能来估计此原因。 我们介绍了源代码密度,这是对提交的净大小的度量,并显示了与以前的基于尺寸的分类相比,它如何提高自动提交分类的准确性。我们还研究了上几代的提交如何影响提交的类别,以及将先前提交的代码密度考虑在内是否可以进一步提高准确性。 对于交叉项目的犯罪分类,我们达到了高达89%的精度和0.82的KAPPA,在一个项目上训练该模型并应用于其他项目。接受单个项目培训的模型可产生高达93%的精度,而Kappa接近0.90。自动提交分类的准确性对利用分类的软件(过程)质量分析有直接影响,因此我们对准确性的提高还将提高对此类分析的信心。
Source code is changed for a reason, e.g., to adapt, correct, or adapt it. This reason can provide valuable insight into the development process but is rarely explicitly documented when the change is committed to a source code repository. Automatic commit classification uses features extracted from commits to estimate this reason. We introduce source code density, a measure of the net size of a commit, and show how it improves the accuracy of automatic commit classification compared to previous size-based classifications. We also investigate how preceding generations of commits affect the class of a commit, and whether taking the code density of previous commits into account can improve the accuracy further. We achieve up to 89% accuracy and a Kappa of 0.82 for the cross-project commit classification where the model is trained on one project and applied to other projects. Models trained on single projects yield accuracies of up to 93% with a Kappa approaching 0.90. The accuracy of the automatic commit classification has a direct impact on software (process) quality analyses that exploit the classification, so our improvements to the accuracy will also improve the confidence in such analyses.