在下一步的预测中，语言模型比人类好

论文标题

在下一步的预测中，语言模型比人类好

Language models are better than humans at next-token prediction

论文作者

Shlegeris, Buck, Roger, Fabien, Chan, Lawrence, McLean, Euan

论文摘要

当前的语言模型被认为具有自然语言任务的亚人类功能，例如提问或编写代码。但是，语言模型未经过训练以在这些任务上表现良好，他们经过培训，可以准确预测令牌文本中的下一个令牌。目前尚不清楚语言模型在下一步的预测中比人类更好还是更糟。为了回答这个问题，我们进行了两个不同的实验，以直接比较这一方面的人类和语言模型：一种测量TOP-1准确性和另一个测量困惑。在这两个实验中，我们都发现人类始终如一地\ emph {vald}，甚至在下一步的预测中，甚至像GPT3-ADA这样的相对较小的语言模型。

Current language models are considered to have sub-human capabilities at natural language tasks like question-answering or writing code. However, language models are not trained to perform well at these tasks, they are trained to accurately predict the next token given previous tokes in tokenized text. It is not clear whether language models are better or worse than humans at next token prediction. To try to answer this question, we performed two distinct experiments to directly compare humans and language models on this front: one measuring top-1 accuracy and the other measuring perplexity. In both experiments, we find humans to be consistently \emph{worse} than even relatively small language models like GPT3-Ada at next-token prediction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题