论文标题
大型语言模型的平行上下文窗口
Parallel Context Windows for Large Language Models
论文作者
论文摘要
当应用于处理长文本时,大型语言模型(LLMS)受其上下文窗口的限制。解决此限制的现有努力涉及培训专业体系结构,并且不能轻易应用于现成的LLMS。我们提出了并行上下文Windows(PCW),该方法可减轻任何现成的LLM的上下文窗口限制,而无需进一步培训。该方法的关键是将长篇小说刻在块中(``Windows''),限制了仅在每个窗口中应用的注意机制,并重新使用窗口上的位置嵌入。我们的主要结果测试了在封闭式学习中的PCW方法,其大小在7.5亿至1780亿个参数之间的模型,并显示出具有不同输入和输出空间的任务的实质性改进。我们在其他情况下可能有益的其他环境中显示出其他好处:多跳的问题和带有多个检索文档的回答问题。我们的结果突出显示了并行上下文窗口是在需要长文本序列的一系列设置中应用现成的LLM的有前途的方法。我们在https://github.com/ai21labs/parallel-context-windows上公开代码。
When applied to processing long text, Large Language Models (LLMs) are limited by their context window. Existing efforts to address this limitation involve training specialized architectures, and cannot be easily applied to off-the-shelf LLMs. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows''), restrict the attention mechanism to apply only within each window, and re-use the positional embeddings across the windows. Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents. Our results highlight Parallel Context Windows as a promising method for applying off-the-shelf LLMs in a range of settings that require long text sequences. We make our code publicly available at https://github.com/ai21labs/parallel-context-windows.