论文标题

通过预培训自然语言来纳入外部知识以代码生成

Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

论文作者

Xu, Frank F., Jiang, Zhengbao, Yin, Pengcheng, Vasilescu, Bogdan, Neubig, Graham

论文摘要

开放域代码生成的目的是通过自然语言(NL)意图以通用编程语言(例如Python)生成代码。由开发人员通常在编写代码时在网络上检索资源的直觉的动机,我们探讨了将两种外部知识纳入NL-to-od代码生成的有效性:从在线编程QA QA论坛stackoverflow和编程语言API文档中自动开采的NL代码对。我们的评估表明,将两种来源与数据增强和基于检索的数据重新采样相结合可在代码生成测试床CONALA上提高当前最新ART的绝对BLEU分数。代码和资源可在https://github.com/neulab/external-knowledge-codegen上找到。

Open-domain code generation aims to generate code in a general-purpose programming language (such as Python) from natural language (NL) intents. Motivated by the intuition that developers usually retrieve resources on the web when writing code, we explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation. Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa. The code and resources are available at https://github.com/neulab/external-knowledge-codegen.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源