语言模型的当前局限性：您需要的是检索

论文标题

语言模型的当前局限性：您需要的是检索

Current Limitations of Language Models: What You Need is Retrieval

论文作者

Komatsuzaki, Aran

论文摘要

我们对改善语言模型的性能折衷的一些当前方法进行了分类，包括（1）非因果模型（例如蒙版语言模型），（2）批量延长长度以有效的关注，（3）复发，（3）（4）（4）条件计算和（5）回收。我们确定了一些局限性（1） - （4）遭受的困扰。例如，（1）当前在开放式文本生成中挣扎，由于需要特定的微调数据集，因此其输出宽松地限制了输出以及执行GPT-2/3（例如GPT-2/3）的常规文本任务。（2）和（3）不要改善第一个$ \ sim 10^3 $令牌的预测。扩大模型大小（例如，使用（4））仍然会导致某些任务的性能缩放差。我们认为（5）将解决许多此类局限性，并且可以（a）减少监督量，并且（b）有效地将上下文扩展到整个培训数据集以及当前样本的整个过去。我们推测如何修改MARGE以执行与猎犬共同训练的无监督因果建模。

We classify and re-examine some of the current approaches to improve the performance-computes trade-off of language models, including (1) non-causal models (such as masked language models), (2) extension of batch length with efficient attention, (3) recurrence, (4) conditional computation and (5) retrieval. We identify some limitations (1) - (4) suffer from. For example, (1) currently struggles with open-ended text generation with the output loosely constrained by the input as well as performing general textual tasks like GPT-2/3 due to its need for a specific fine-tuning dataset. (2) and (3) do not improve the prediction of the first $\sim 10^3$ tokens. Scaling up a model size (e.g. efficiently with (4)) still results in poor performance scaling for some tasks. We argue (5) would resolve many of these limitations, and it can (a) reduce the amount of supervision and (b) efficiently extend the context over the entire training dataset and the entire past of the current sample. We speculate how to modify MARGE to perform unsupervised causal modeling that achieves (b) with the retriever jointly trained.

下载PDF全文

下载文献需遵守相关版权规定

论文标题