研究有效扩展变压器以进行长输入摘要

论文标题

研究有效扩展变压器以进行长输入摘要

Investigating Efficiently Extending Transformers for Long Input Summarization

论文作者

Phang, Jason, Zhao, Yao, Liu, Peter J.

论文摘要

尽管经过验证的大型变压器模型已被证明有能力应对自然语言任务，但处理长序列输入仍然是一个重大挑战。这样的任务之一就是长输入摘要，其中输入比大多数预验证的模型的最大输入上下文更长。通过大量的实验，我们研究了哪些模型架构变化和预处理范式可以最有效地适应经过验证的变压器以进行长输入摘要。我们发现，带有全局编码器令牌的交错的，块的变压器可以达到良好的性能和效率平衡，并且在长序列上有意义地改善了下游汇总性能。根据我们的发现，我们介绍了Pegasus-X，这是Pegasus模型的扩展，并具有额外的长输入预处理，以处理最多16K令牌的输入。 Pegasus-X在长输入摘要任务上实现了强大的性能，与更大的模型相当，同时添加了几个其他参数，而不需要模型并行训练。

While large pretrained Transformer models have proven highly capable at tackling natural language tasks, handling long sequence inputs continues to be a significant challenge. One such task is long input summarization, where inputs are longer than the maximum input context of most pretrained models. Through an extensive set of experiments, we investigate what model architectural changes and pretraining paradigms can most efficiently adapt a pretrained Transformer for long input summarization. We find that a staggered, block-local Transformer with global encoder tokens strikes a good balance of performance and efficiency, and that an additional pretraining phase on long sequences meaningfully improves downstream summarization performance. Based on our findings, we introduce PEGASUS-X, an extension of the PEGASUS model with additional long input pretraining to handle inputs of up to 16K tokens. PEGASUS-X achieves strong performance on long input summarization tasks comparable with much larger models while adding few additional parameters and not requiring model parallelism to train.

下载PDF全文

下载文献需遵守相关版权规定

论文标题