代码生成工具（几乎）免费？关于代码上的几次，预训练的语言模型的研究

论文标题

代码生成工具（几乎）免费？关于代码上的几次，预训练的语言模型的研究

Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code

论文作者

Bareiß, Patrick, Souza, Beatriz, d'Amorim, Marcelo, Pradel, Michael

论文摘要

通过大规模，预训练的语言模型进行的几乎没有学习的学习是回答有关代码问题的有力方法，例如，如何完成给定的代码示例，甚至从头开始生成代码段。这些模型的成功提出了一个问题，它们是否可以作为构建广泛代码生成工具的基础。传统上，此类工具是为每个任务手动和单独构建的。取而代之的是，只需提供一些示例或对预期工具行为的自然语言描述，就可以从单个预训练的语言模型中获取不同的工具。本文研究了代码的最先进的，预训练的代码模型，Codex可以在多大程度上提高此目的。我们考虑由一系列传统工具针对的三个代码操纵和代码生成任务：（i）代码突变；（ii）从自然语言文档中测试Oracle的生成；（iii）测试案例生成。对于每个任务，我们将几杆学习与手动构建的工具进行比较。我们的结果表明，基于模型的工具补充（代码突变），在PAR上（测试Oracle生成），甚至超越了其各自的传统构建的工具（测试案例生成），同时施加了开发它们的努力少得多。通过比较基于模型的工具的不同变体的有效性，我们提供了有关如何将适当输入（“提示”）设计到模型以及模型大小的影响的见解。例如，我们发现，提供代码生成任务的小型自然语言描述是改善预测的一种简单方法。总体而言，我们得出的结论是，很少有语言模型令人惊讶地有效，但是还有更多的工作要做，例如探索更多多样化的方式来促使和解决更多有关任务。

Few-shot learning with large-scale, pre-trained language models is a powerful way to answer questions about code, e.g., how to complete a given code example, or even generate code snippets from scratch. The success of these models raises the question whether they could serve as a basis for building a wide range code generation tools. Traditionally, such tools are built manually and separately for each task. Instead, few-shot learning may allow to obtain different tools from a single pre-trained language model by simply providing a few examples or a natural language description of the expected tool behavior. This paper studies to what extent a state-of-the-art, pre-trained language model of code, Codex, may serve this purpose. We consider three code manipulation and code generation tasks targeted by a range of traditional tools: (i) code mutation; (ii) test oracle generation from natural language documentation; and (iii) test case generation. For each task, we compare few-shot learning to a manually built tool. Our results show that the model-based tools complement (code mutation), are on par (test oracle generation), or even outperform their respective traditionally built tool (test case generation), while imposing far less effort to develop them. By comparing the effectiveness of different variants of the model-based tools, we provide insights on how to design an appropriate input ("prompt") to the model and what influence the size of the model has. For example, we find that providing a small natural language description of the code generation task is an easy way to improve predictions. Overall, we conclude that few-shot language models are surprisingly effective, yet there is still more work to be done, such as exploring more diverse ways of prompting and tackling even more involved tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题