缺陷感知神经代码排名

论文标题

缺陷感知神经代码排名

Fault-Aware Neural Code Rankers

论文作者

Inala, Jeevana Priya, Wang, Chenglong, Yang, Mei, Codas, Andres, Encarnación, Mark, Lahiri, Shuvendu K, Musuvathi, Madanlal, Gao, Jianfeng

论文摘要

大型语言模型（LLMS）表现出了为各种编程任务生成代码的令人印象深刻的能力。在许多情况下，LLM可以在进行大量试验时为任务生成正确的程序。因此，最近的趋势是使用模型对程序进行大规模采样，然后根据程序执行程序对少数已知单元测试的程序进行过滤/对程序进行过滤/排名，以选择一个候选解决方案。但是，这些方法假定给予单位测试，并假定能够安全执行生成的程序（可以进行任意危险操作，例如文件操作）。以上两个假设在实际软件开发中都是不切实际的。在本文中，我们提出了CodeRanker，这是一种神经等级器，可以预测采样程序的正确性而无需执行该程序。我们的编码人员是故障的，即经过训练可以预测各种执行信息，例如预测精确的编译/运行时错误类型（例如，indexError或typeError）。我们表明，编码机可以大大提高应用程序，HumaneVal和MBPP数据集的各种代码生成模型（包括Codex，GPT-Neo，GPT-J）的通行@1的准确性。

Large language models (LLMs) have demonstrated an impressive ability to generate code for various programming tasks. In many instances, LLMs can generate a correct program for a task when given numerous trials. Consequently, a recent trend is to do large scale sampling of programs using a model and then filtering/ranking the programs based on the program execution on a small number of known unit tests to select one candidate solution. However, these approaches assume that the unit tests are given and assume the ability to safely execute the generated programs (which can do arbitrary dangerous operations such as file manipulations). Both of the above assumptions are impractical in real-world software development. In this paper, we propose CodeRanker, a neural ranker that can predict the correctness of a sampled program without executing it. Our CodeRanker is fault-aware i.e., it is trained to predict different kinds of execution information such as predicting the exact compile/runtime error type (e.g., an IndexError or a TypeError). We show that CodeRanker can significantly increase the pass@1 accuracy of various code generation models (including Codex, GPT-Neo, GPT-J) on APPS, HumanEval and MBPP datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题