大型语言模型的紧急能力

论文标题

大型语言模型的紧急能力

Emergent Abilities of Large Language Models

论文作者

Wei, Jason, Tay, Yi, Bommasani, Rishi, Raffel, Colin, Zoph, Barret, Borgeaud, Sebastian, Yogatama, Dani, Bosma, Maarten, Zhou, Denny, Metzler, Donald, Chi, Ed H., Hashimoto, Tatsunori, Vinyals, Oriol, Liang, Percy, Dean, Jeff, Fedus, William

论文摘要

扩展语言模型已显示出可以预见的，可以提高各种下游任务的性能和样本效率。相反，本文讨论了一种不可预测的现象，我们将其称为大语言模型的新兴能力。如果不存在于较小的模型中，而是在较大的模型中存在，那么我们认为它可以突然出现。因此，不仅可以通过推断较小模型的性能来预测紧急能力。这种出现的存在意味着额外的扩展可以进一步扩大语言模型的能力范围。

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题