位置嵌入式学习什么？对预训练的语言模型位置编码的实证研究

论文标题

位置嵌入式学习什么？对预训练的语言模型位置编码的实证研究

What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding

论文作者

Wang, Yu-An, Chen, Yun-Nung

论文摘要

近年来，预训练的变压器占据了大多数NLP基准任务。许多预训练的变压器的许多变体都在不断爆发，大多数人专注于设计不同的训练预训练目标或自我注意的变体。将位置信息嵌入自我注意机制中也是变压器中必不可少的因素，但是经常随意讨论。因此，本文对主流预训练的变压器的位置嵌入的位置进行了实证研究，该研究主要关注两个问题：1）嵌入位置嵌入真的学习了位置的含义吗？ 2）这些不同的学习位置嵌入如何影响变形金刚的NLP任务？本文着重于通过在大多数标志性NLP任务上进行特征级分析和经验实验来提供预训练的位置嵌入的新见解。据信，我们的实验结果可以指导未来的工作，以选择给定应用程序属性的特定任务的合适的位置编码功能。

In recent years, pre-trained Transformers have dominated the majority of NLP benchmark tasks. Many variants of pre-trained Transformers have kept breaking out, and most focus on designing different pre-training objectives or variants of self-attention. Embedding the position information in the self-attention mechanism is also an indispensable factor in Transformers however is often discussed at will. Therefore, this paper carries out an empirical study on position embeddings of mainstream pre-trained Transformers, which mainly focuses on two questions: 1) Do position embeddings really learn the meaning of positions? 2) How do these different learned position embeddings affect Transformers for NLP tasks? This paper focuses on providing a new insight of pre-trained position embeddings through feature-level analysis and empirical experiments on most of iconic NLP tasks. It is believed that our experimental results can guide the future work to choose the suitable positional encoding function for specific tasks given the application property.

下载PDF全文

下载文献需遵守相关版权规定

论文标题