Astro：一种可推广的神经克隆检测的AST辅助方法

论文标题

Astro：一种可推广的神经克隆检测的AST辅助方法

ASTRO: An AST-Assisted Approach for Generalizable Neural Clone Detection

论文作者

Zhang, Yifan, Yang, Junwen, Dong, Haoyu, Wang, Qingchen, Shao, Huajie, Leach, Kevin, Huang, Yu

论文摘要

神经克隆检测吸引了软件工程研究人员和从业人员的注意。但是，大多数神经克隆检测方法不会超出训练数据集中出现的克隆范围。这会导致模型性能差，尤其是在模型召回方面。在本文中，我们提出了一种抽象的语法树（AST）辅助方法，用于可推广的神经克隆检测或Astro，这是在反映行业实践的代码库中查找克隆的框架。我们提供三个主要组成部分：（1）用于利用程序结构和语义的源代码的AST启发的表示，（2）全局图表表示程序中捕获AST的上下文，以及（3）程序的图形嵌入程序，该程序与现有的大规模语言模型结合使用，改善了现有的大型语言模型，改善了现有的ART ART CODER CLONE CLONE CLONE CLONE CLONE CLONE CLONE CLONE CLONE CLONE CLONE CLONE CLONE CLONE CLONE CLONE CLONE。我们的实验结果表明，Astro改善了召回和F-1分数的最新神经克隆检测方法。

Neural clone detection has attracted the attention of software engineering researchers and practitioners. However, most neural clone detection methods do not generalize beyond the scope of clones that appear in the training dataset. This results in poor model performance, especially in terms of model recall. In this paper, we present an Abstract Syntax Tree (AST) assisted approach for generalizable neural clone detection, or ASTRO, a framework for finding clones in codebases reflecting industry practices. We present three main components: (1) an AST-inspired representation for source code that leverages program structure and semantics, (2) a global graph representation that captures the context of an AST among a corpus of programs, and (3) a graph embedding for programs that, in combination with extant large-scale language models, improves state-of-the-art code clone detection. Our experimental results show that ASTRO improves state-of-the-art neural clone detection approaches in both recall and F-1 scores.

下载PDF全文

下载文献需遵守相关版权规定

论文标题