论文标题
CC2VEC:代码更改的分布式表示
CC2Vec: Distributed Representations of Code Changes
论文作者
论文摘要
在软件补丁上的现有工作通常使用特定于单个任务的功能。这些作品通常依赖于手动确定的功能,并且需要人类的努力来确定每个任务的这些功能。在这项工作中,我们提出了CC2VEC,这是一种神经网络模型,该模型学习代码在附带的日志消息的指导下的代码更改的表示,该消息代表代码的语义意图。 CC2VEC在注意机制的帮助下对代码更改的层次结构进行建模,并使用多个比较功能来确定删除代码和添加的代码之间的差异。 为了评估CC2VEC是否可以产生代码更改的分布式表示形式,该代码更改对软件补丁的多个任务有用,我们使用CC2VEC生产的向量进行三个任务:日志消息生成,错误修复补丁补丁识别和仅在时间缺陷预测中。在所有任务中,使用CC2VEC的模型优于最新技术。
Existing work on software patches often use features specific to a single task. These works often rely on manually identified features, and human effort is required to identify these features for each task. In this work, we propose CC2Vec, a neural network model that learns a representation of code changes guided by their accompanying log messages, which represent the semantic intent of the code changes. CC2Vec models the hierarchical structure of a code change with the help of the attention mechanism and uses multiple comparison functions to identify the differences between the removed and added code. To evaluate if CC2Vec can produce a distributed representation of code changes that is general and useful for multiple tasks on software patches, we use the vectors produced by CC2Vec for three tasks: log message generation, bug fixing patch identification, and just-in-time defect prediction. In all tasks, the models using CC2Vec outperform the state-of-the-art techniques.