论文标题

M2TS:基于变压器的多尺度多模式方法用于源代码摘要

M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization

论文作者

Gao, Yuexiu, Lyu, Chen

论文摘要

源代码摘要旨在生成代码段的自然语言描述。许多现有研究从其令牌序列和抽象的语法树(ASTS)学习代码片段的句法和语义知识。他们使用学到的代码表示作为代码摘要模型的输入,可以相应地生成描述源代码的摘要。传统模型将AST作为序列或将AST划分为路径作为输入。但是,前者失去了AST的结构特性,而后者破坏了AST的整体结构。因此,全面捕获AST在学习代码表示中的AST的结构特征是源代码摘要仍然是一个具有挑战性的问题。在本文中,我们提出了M2Ts,这是一种基于源代码摘要的变压器的多尺度多模式方法。 M2TS使用多尺度的AST特征提取方法,该方法可以在多个局部和全局水平上更准确,准确地提取AST的结构。为了补充AST中缺少的语义信息,我们还获得了代码令牌功能,并使用交叉模态融合方法将它们与提取的AST功能相结合,该方法不仅融合了源代码的句法和上下文语义信息,还突出了每种模态的关键特征。我们对两个Java和一个Python数据集进行了实验,实验结果表明M2TS的表现优于当前最新方法。我们在https://github.com/transms/m2ts上发布代码。

Source code summarization aims to generate natural language descriptions of code snippets. Many existing studies learn the syntactic and semantic knowledge of code snippets from their token sequences and Abstract Syntax Trees (ASTs). They use the learned code representations as input to code summarization models, which can accordingly generate summaries describing source code. Traditional models traverse ASTs as sequences or split ASTs into paths as input. However, the former loses the structural properties of ASTs, and the latter destroys the overall structure of ASTs. Therefore, comprehensively capturing the structural features of ASTs in learning code representations for source code summarization remains a challenging problem to be solved. In this paper, we propose M2TS, a Multi-scale Multi-modal approach based on Transformer for source code Summarization. M2TS uses a multi-scale AST feature extraction method, which can extract the structures of ASTs more completely and accurately at multiple local and global levels. To complement missing semantic information in ASTs, we also obtain code token features, and further combine them with the extracted AST features using a cross modality fusion method that not only fuses the syntactic and contextual semantic information of source code, but also highlights the key features of each modality. We conduct experiments on two Java and one Python datasets, and the experimental results demonstrate that M2TS outperforms current state-of-the-art methods. We release our code at https://github.com/TranSMS/M2TS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源