论文标题
暹罗网络的语义代码搜索评估
Evaluation of Siamese Networks for Semantic Code Search
论文作者
论文摘要
随着开放存储库和讨论论坛数量的增加,使用自然语言搜索的使用变得越来越普遍。但是,由于1)代码和用户查询之间的共享词汇量有限,而2)语义对用户查询及其与代码语法的关系不足。暹罗网络非常适合学习数据之间的这种联合关系,但在代码搜索的背景下尚未探索。在这项工作中,我们通过探索多个提取网络体系结构来评估此任务的暹罗网络。这些网络将它们传递到暹罗网络之前,在公共空间中学习嵌入,在将它们传递到暹罗网络之前就可以独立处理代码和文本描述。我们在两个不同的数据集上进行了实验,发现暹罗网络可以在网络上充当强大的正规化器,从而从代码和文本中提取丰富的信息,这反过来又有助于在代码搜索上获得令人印象深刻的性能,以$ 2 $编程语言上的先前基线。我们还分析了这些网络的嵌入空间,并提供了方向,以充分利用暹罗网络的功能进行语义代码搜索。
With the increase in the number of open repositories and discussion forums, the use of natural language for semantic code search has become increasingly common. The accuracy of the results returned by such systems, however, can be low due to 1) limited shared vocabulary between code and user query and 2) inadequate semantic understanding of user query and its relation to code syntax. Siamese networks are well suited to learning such joint relations between data, but have not been explored in the context of code search. In this work, we evaluate Siamese networks for this task by exploring multiple extraction network architectures. These networks independently process code and text descriptions before passing them to a Siamese network to learn embeddings in a common space. We experiment on two different datasets and discover that Siamese networks can act as strong regularizers on networks that extract rich information from code and text, which in turn helps achieve impressive performance on code search beating previous baselines on $2$ programming languages. We also analyze the embedding space of these networks and provide directions to fully leverage the power of Siamese networks for semantic code search.