使用源代码密度来提高自动提交分类的准确性到维护活动

论文标题

使用源代码密度来提高自动提交分类的准确性到维护活动

Using Source Code Density to Improve the Accuracy of Automatic Commit Classification into Maintenance Activities

论文作者

Hönel, Sebastian, Ericsson, Morgan, Löwe, Welf, Wingkvist, Anna

论文摘要

更改源代码的原因，例如，以适应，更正或调整它。这个原因可以提供对开发过程的宝贵见解，但在更改将其投入到源代码存储库时很少有明确的记录。自动提交分类使用从提交提取的功能来估计此原因。我们介绍了源代码密度，这是对提交的净大小的度量，并显示了与以前的基于尺寸的分类相比，它如何提高自动提交分类的准确性。我们还研究了上几代的提交如何影响提交的类别，以及将先前提交的代码密度考虑在内是否可以进一步提高准确性。对于交叉项目的犯罪分类，我们达到了高达89％的精度和0.82的KAPPA，在一个项目上训练该模型并应用于其他项目。接受单个项目培训的模型可产生高达93％的精度，而Kappa接近0.90。自动提交分类的准确性对利用分类的软件（过程）质量分析有直接影响，因此我们对准确性的提高还将提高对此类分析的信心。

Source code is changed for a reason, e.g., to adapt, correct, or adapt it. This reason can provide valuable insight into the development process but is rarely explicitly documented when the change is committed to a source code repository. Automatic commit classification uses features extracted from commits to estimate this reason. We introduce source code density, a measure of the net size of a commit, and show how it improves the accuracy of automatic commit classification compared to previous size-based classifications. We also investigate how preceding generations of commits affect the class of a commit, and whether taking the code density of previous commits into account can improve the accuracy further. We achieve up to 89% accuracy and a Kappa of 0.82 for the cross-project commit classification where the model is trained on one project and applied to other projects. Models trained on single projects yield accuracies of up to 93% with a Kappa approaching 0.90. The accuracy of the automatic commit classification has a direct impact on software (process) quality analyses that exploit the classification, so our improvements to the accuracy will also improve the confidence in such analyses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题