我们如何重构以及如何记录它？关于使用监督的机器学习算法将重新制作文档分类

论文标题

我们如何重构以及如何记录它？关于使用监督的机器学习算法将重新制作文档分类

How We Refactor and How We Document it? On the Use of Supervised Machine Learning Algorithms to Classify Refactoring Documentation

论文作者

AlOmar, Eman Abdullah, Peruma, Anthony, Mkaouer, Mohamed Wiem, Newman, Christian, Ouni, Ali, Kessentini, Marouane

论文摘要

重构是改善系统设计而不改变其外部行为的艺术。重构已成为一种良好且纪律严明的软件工程实践，该实践吸引了大量研究，认为重构主要是由于改善系统结构的需要而动机。但是，最近的研究表明，开发人员可以将重构纳入其他开发活动中，这些开发活动不仅仅是改进设计。不幸的是，这些研究仅限于开发人员访谈和减少的项目。为了应对上述限制，我们旨在更好地了解是什么促使开发人员通过挖掘和分类一组111,884个投入，其中包含重构，从800个Java项目中提取。我们培训了一个多级分类器，将这些承诺分为3个类别，即内部QA，外部质量检查和代码气味分辨率，以及传统的BugFix和功能类别。这种分类挑战了重构的原始定义，不仅是改善设计和固定代码气味。此外，为了更好地理解我们的分类结果，我们分析了提交信息，以提取开发人员经常使用以描述其重构的文本模式。结果表明，（1）修复代码气味并不是开发人员重构代码库的主要驱动力。由于多种原因而征求了重构，超出了其传统的定义。（2）重构的分布在生产和测试文件之间有所不同；（3）开发人员使用多种模式有目的地针对重构；（4）从提交消息中提取的文本模式为开发人员如何记录其重构提供了更好的覆盖范围。

Refactoring is the art of improving the design of a system without altering its external behavior. Refactoring has become a well established and disciplined software engineering practice that has attracted a significant amount of research presuming that refactoring is primarily motivated by the need to improve system structures. However, recent studies have shown that developers may incorporate refactorings in other development activities that go beyond improving the design. Unfortunately, these studies are limited to developer interviews and a reduced set of projects. To cope with the above-mentioned limitations, we aim to better understand what motivates developers to apply refactoring by mining and classifying a large set of 111,884 commits containing refactorings, extracted from 800 Java projects. We trained a multi-class classifier to categorize these commits into 3 categories, namely, Internal QA, External QA, and Code Smell Resolution, along with the traditional BugFix and Functional categories. This classification challenges the original definition of refactoring, being exclusive to improving the design and fixing code smells. Further, to better understand our classification results, we analyzed commit messages to extract textual patterns that developers regularly use to describe their refactorings. The results show that (1) fixing code smells is not the main driver for developers to refactoring their codebases. Refactoring is solicited for a wide variety of reasons, going beyond its traditional definition; (2) the distribution of refactorings differs between production and test files; (3) developers use several patterns to purposefully target refactoring; (4) the textual patterns, extracted from commit messages, provide better coverage for how developers document their refactorings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题