论文标题
是什么才是一个好的提交信息?
What Makes a Good Commit Message?
论文作者
论文摘要
协作软件开发的关键问题是开发人员之间的通信。通信的一种方式是提交信息,在该信息中,开发人员描述了他们在存储库中所做的更改。因此,提交消息用作“审计跟踪”,开发人员可以通过该消息了解项目的源代码如何变化 - 原因。因此,提交信息的质量会影响开发人员之间交流的有效性。提交信息通常质量很差,因为开发人员缺乏时间和动力来制作好信息。已经提出了几种自动方法来生成提交消息。但是,这些基于未经切割的数据集,包括相当大的措辞提交信息。在这项多方法研究中,我们首先定义了构成“良好”提交信息的内容,然后使用来自五个高度活跃的开源项目的近1,600条消息的示例确定哪些提交消息的比例缺乏信息。我们发现,平均约44%的消息可以改善,这表明当提交消息生成器接受此类数据培训时,使用未修剪的数据集可能是一个主要威胁。我们还观察到,先前的工作没有考虑提交消息的语义,而令人惊讶的是,写出好的提交消息的指导很少。为此,我们基于提交消息的表达方式中的重复模式开发了一种分类学。最后,我们研究是否可以自动识别“好”提交消息。这种自动化可能会促使开发人员编写更好的提交消息。
A key issue in collaborative software development is communication among developers. One modality of communication is a commit message, in which developers describe the changes they make in a repository. As such, commit messages serve as an "audit trail" by which developers can understand how the source code of a project has changed-and why. Hence, the quality of commit messages affects the effectiveness of communication among developers. Commit messages are often of poor quality as developers lack time and motivation to craft a good message. Several automatic approaches have been proposed to generate commit messages. However, these are based on uncurated datasets including considerable proportions of poorly phrased commit messages. In this multi-method study, we first define what constitutes a "good" commit message, and then establish what proportion of commit messages lack information using a sample of almost 1,600 messages from five highly active open source projects. We find that an average of circa 44% of messages could be improved, suggesting the use of uncurated datasets may be a major threat when commit message generators are trained with such data. We also observe that prior work has not considered semantics of commit messages, and there is surprisingly little guidance available for writing good commit messages. To that end, we develop a taxonomy based on recurring patterns in commit messages' expressions. Finally, we investigate whether "good" commit messages can be automatically identified; such automation could prompt developers to write better commit messages.