论文标题

parsel:通过构成分解的语言模型的算法推理

Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions

论文作者

Zelikman, Eric, Huang, Qian, Poesia, Gabriel, Goodman, Noah D., Haber, Nick

论文摘要

尽管最近在大型语言模型(LLM)推理方面取得了成功,但LLM在层次的多步推理任务中挣扎,例如生成复杂的程序。对于这些任务,人类通常从高级算法设计开始,然后逐渐实现每个部分。我们介绍了Parsel,这是一个框架,可自动实现和使用代码LLM的复杂算法验证。使用PARSEL,我们将算法任务自动分解为层次自然语言功能描述,然后使用测试搜索可能的函数实现的组合。我们表明,可以在需要层次推理的域中使用PARSEL,包括程序合成和机器人计划。我们发现,使用PARSEL,LLMS在应用程序数据集中解决了更多的竞争级问题,导致通过率超过75%\%的速度比直接采样字母和法典的先前结果,而通常使用较小的样本预算。此外,通过自动生成的测试,我们发现PARSEL可以将HOMANEVAL的最新通行证从67 \%提高到85 \%。我们还发现,使用PARSEL生成的LLM生成的机器人计划被认为是准确的两倍以上是直接生成的计划。最后,我们探讨了Parsel如何解决LLM限制并讨论Parsel如何对人类程序员有用。我们在https://github.com/ezelikman/parsel上发布代码

Despite recent success in large language model (LLM) reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs. For these tasks, humans often start with a high-level algorithmic design and implement each part gradually. We introduce Parsel, a framework enabling automatic implementation and validation of complex algorithms with code LLMs. With Parsel, we automatically decompose algorithmic tasks into hierarchical natural language function descriptions and then search over combinations of possible function implementations using tests. We show that Parsel can be used across domains requiring hierarchical reasoning, including program synthesis and robotic planning. We find that, using Parsel, LLMs solve more competition-level problems in the APPS dataset, resulting in pass rates over 75\% higher than prior results from directly sampling AlphaCode and Codex, while often using a smaller sample budget. Moreover, with automatically generated tests, we find that Parsel can improve the state-of-the-art pass@1 performance on HumanEval from 67\% to 85\%. We also find that LLM-generated robotic plans using Parsel are more than twice as likely to be considered accurate than directly generated plans. Lastly, we explore how Parsel addresses LLM limitations and discuss how Parsel may be useful for human programmers. We release our code at https://github.com/ezelikman/parsel

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源