论文标题

结构编码器:代码生成的结构感知变压器

StructCoder: Structure-Aware Transformer for Code Generation

论文作者

Tipirneni, Sindhu, Zhu, Ming, Reddy, Chandan K.

论文摘要

最近,人们对使用深度学习自动化软件工程任务的兴趣激增。本文解决了代码生成的问题,目标是以不同的语言或自然语言描述生成目标代码给定代码。代码生成的大多数最先进的深度学习模型都使用培训策略,主要是为自然语言设计的。但是,理解和生成代码需要对代码语法和语义的更严格理解。有了这种动机,我们开发了一个编码器变压器模型,其中编码器和解码器都经过明确训练,以分别识别源和目标代码中的语法和数据流。我们不仅通过利用源代码的语法树和数据流图来使编码器结构感知,而且我们还通过引入两个新颖的辅助任务来保存目标代码的语法和数据流程,来支持解码器:AST(ASTRACK SYTAX树)路径和数据流预测。据我们所知,这是第一项引入结构感知的变压器解码器,该解码器对语法和数据流进行建模以增强生成的代码的质量。所提出的结构编码模型在CodexGlue基准测试中实现了代码翻译和文本对编码生成任务的最新性能,并改善了Apps代码生成基准的基准。我们的代码可在https://github.com/reddy-lab-code-research/sstructcoder/上公开获取。

There has been a recent surge of interest in automating software engineering tasks using deep learning. This paper addresses the problem of code generation, where the goal is to generate target code given source code in a different language or a natural language description. Most state-of-the-art deep learning models for code generation use training strategies primarily designed for natural language. However, understanding and generating code requires a more rigorous comprehension of the code syntax and semantics. With this motivation, we develop an encoder-decoder Transformer model where both the encoder and decoder are explicitly trained to recognize the syntax and data flow in the source and target codes, respectively. We not only make the encoder structure-aware by leveraging the source code's syntax tree and data flow graph, but we also support the decoder in preserving the syntax and data flow of the target code by introducing two novel auxiliary tasks: AST (Abstract Syntax Tree) paths prediction and data flow prediction. To the best of our knowledge, this is the first work to introduce a structure-aware Transformer decoder that models both syntax and data flow to enhance the quality of generated code. The proposed StructCoder model achieves state-of-the-art performance on code translation and text-to-code generation tasks in the CodeXGLUE benchmark, and improves over baselines of similar size on the APPS code generation benchmark. Our code is publicly available at https://github.com/reddy-lab-code-research/StructCoder/.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源