论文标题
关于神经程序模型的普遍性,关于语义保护程序转换
On the Generalizability of Neural Program Models with respect to Semantic-Preserving Program Transformations
论文作者
论文摘要
随着公开可用的源代码存储库的普遍性培训深度神经网络模型,神经程序模型可以在源代码分析任务中做得很好,例如在给定程序中预测方法名称,而传统程序分析技术无法轻松完成。尽管此类神经程序模型已经在各种现有数据集上进行了测试,但它们推广到不可预见的源代码的程度在很大程度上是未知的。由于在所有不可预见的程序上测试神经程序模型是非常具有挑战性的,因此在本文中,我们建议评估神经程序模型在语义保护转换方面的普遍性:可推广的神经程序模型应在相同的语义但具有不同的词汇外观和句法结构的程序上同样出色。我们将各种神经程序模型的结果与自动化语义传播转换之前和之后的方法名称预测任务进行比较。我们使用三个不同大小的Java数据集和代码的三个最先进的神经网络模型,即Code2Vec,code2seq和ggnn,以构建9个这样的神经程序模型进行评估。我们的结果表明,即使对程序进行小小的语义保留更改,这些神经程序模型也常常无法推广其性能。我们的结果还表明,基于数据中的数据和控制依赖性的神经程序模型比仅基于抽象语法树的神经程序模型更好地概括了。从积极的一面来看,我们观察到,随着训练数据集的规模的增长和多样化,也可以改善神经程序模型产生的正确预测的普遍性。我们关于神经程序模型的普遍性的结果提供了见解,以衡量其局限性,并为其改进提供了垫脚石。
With the prevalence of publicly available source code repositories to train deep neural network models, neural program models can do well in source code analysis tasks such as predicting method names in given programs that cannot be easily done by traditional program analysis techniques. Although such neural program models have been tested on various existing datasets, the extent to which they generalize to unforeseen source code is largely unknown. Since it is very challenging to test neural program models on all unforeseen programs, in this paper, we propose to evaluate the generalizability of neural program models with respect to semantic-preserving transformations: a generalizable neural program model should perform equally well on programs that are of the same semantics but of different lexical appearances and syntactical structures. We compare the results of various neural program models for the method name prediction task on programs before and after automated semantic-preserving transformations. We use three Java datasets of different sizes and three state-of-the-art neural network models for code, namely code2vec, code2seq, and GGNN, to build nine such neural program models for evaluation. Our results show that even with small semantically preserving changes to the programs, these neural program models often fail to generalize their performance. Our results also suggest that neural program models based on data and control dependencies in programs generalize better than neural program models based only on abstract syntax trees. On the positive side, we observe that as the size of the training dataset grows and diversifies the generalizability of correct predictions produced by the neural program models can be improved too. Our results on the generalizability of neural program models provide insights to measure their limitations and provide a stepping stone for their improvement.