论文标题
发行跟踪器的惊喜是可行的吗?
Is Surprisal in Issue Trackers Actionable?
论文作者
论文摘要
背景。从信息理论来看,惊奇是对事件的意外程度的衡量。统计语言模型提供了自然语言的概率近似,并且由于出人意料的概率是发生事件发生的,因此可以确定与英语句子相关的惊喜。问题和提取软件存储库问题跟踪器的请求使您可以洞悉开发过程,并可能包含此过程中令人惊讶的事件。 客观的。先前的工作已经确定,软件存储库中的异常事件对开发人员感兴趣,并使用基于代码指标的简单方法来检测它们。在这项研究中,我们将提出一种新的方法,用于使用惊人的软件存储库中不寻常的事件检测。凭借找到令人惊讶的问题并提出请求的能力,我们打算进一步分析它们,以确定它们在存储库中是否真正重要,或者是否构成了重大挑战。如果有可能尽早发现不良惊喜,或者在造成其他麻烦之前,则可以节省努力,成本和时间。 方法。在GitHub上提取问题并从5000个最受欢迎的软件存储库中提取问题并提取请求后,我们将训练一种语言模型来表示这些问题。我们将衡量它们在存储库中的重要性,使用几个类似物来衡量其分辨率难度,测量每个类似物的惊喜,最后生成推论统计数据以描述任何相关性。
Background. From information theory, surprisal is a measurement of how unexpected an event is. Statistical language models provide a probabilistic approximation of natural languages, and because surprisal is constructed with the probability of an event occuring, it is therefore possible to determine the surprisal associated with English sentences. The issues and pull requests of software repository issue trackers give insight into the development process and likely contain the surprising events of this process. Objective. Prior works have identified that unusual events in software repositories are of interest to developers, and use simple code metrics-based methods for detecting them. In this study we will propose a new method for unusual event detection in software repositories using surprisal. With the ability to find surprising issues and pull requests, we intend to further analyse them to determine if they actually hold importance in a repository, or if they pose a significant challenge to address. If it is possible to find bad surprises early, or before they cause additional troubles, it is plausible that effort, cost and time will be saved as a result. Method. After extracting the issues and pull requests from 5000 of the most popular software repositories on GitHub, we will train a language model to represent these issues. We will measure their perceived importance in the repository, measure their resolution difficulty using several analogues, measure the surprisal of each, and finally generate inferential statistics to describe any correlations.