通过自适应上下文汇总的有效表示学习

论文标题

通过自适应上下文汇总的有效表示学习

Efficient Representation Learning via Adaptive Context Pooling

论文作者

Huang, Chen, Talbott, Walter, Jaitly, Navdeep, Susskind, Josh

论文摘要

自我发挥的机制通过使用所有输入令牌之间的成对注意来对远程环境进行建模。在这样做时，他们假设由个体令牌（例如文本字符或图像像素）定义的固定注意粒度，这对于在较高级别上对复杂依赖性进行建模可能不是最佳的。在本文中，我们提出了ContextPool，通过调整每个令牌的注意力粒度来解决此问题。受到与汇总以捕获长距离依赖关系的Convnet的成功的启发，我们学会了为每个令牌汇集每个令牌的相邻特征，然后在给定的注意力层中计算注意力。合并的权重和支撑大小是自适应确定的，允许汇总功能以不同的规模编码有意义的上下文。我们表明，ContextPool使注意力模型更具表现力，经常以更少的层次实现强大的性能，从而大大降低了成本。实验验证了我们的上下文池模块插入变压器模型时，使用较少的语言和图像基准的计算来匹配或超越最先进的性能，胜过最新的工程，这些作品的最新作品具有学识渊博的上下文大小或稀疏的注意力模式，并且也适用于Convnets，以进行有效的功能学习。

Self-attention mechanisms model long-range context by using pairwise attention between all input tokens. In doing so, they assume a fixed attention granularity defined by the individual tokens (e.g., text characters or image pixels), which may not be optimal for modeling complex dependencies at higher levels. In this paper, we propose ContextPool to address this problem by adapting the attention granularity for each token. Inspired by the success of ConvNets that are combined with pooling to capture long-range dependencies, we learn to pool neighboring features for each token before computing attention in a given attention layer. The pooling weights and support size are adaptively determined, allowing the pooled features to encode meaningful context with varying scale. We show that ContextPool makes attention models more expressive, achieving strong performance often with fewer layers and thus significantly reduced cost. Experiments validate that our ContextPool module, when plugged into transformer models, matches or surpasses state-of-the-art performance using less compute on several language and image benchmarks, outperforms recent works with learned context sizes or sparse attention patterns, and is also applicable to ConvNets for efficient feature learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题