论文标题
为什么以及什么时候应该游泳?分析复发体系结构中的合并
Why and when should you pool? Analyzing Pooling in Recurrent Architectures
论文作者
论文摘要
基于汇总的复发性神经体系结构始终在不汇总的情况下优于其同行。但是,他们提高性能的原因在很大程度上尚未进行。在这项工作中,我们检查了三种常用的合并技术(均值,最大程度的流动和注意力),并提出了最大意见,这是一种新颖的变体,可有效捕获句子中预测令牌之间的相互作用。我们发现,基于池的架构与他们的学习能力和位置偏见的非隔离等效物有很大不同,这些架构阐明了其绩效益处。通过分析梯度传播,我们发现合并与Bilstms相比有助于更好的梯度流。此外,我们在序列的开头和结束时揭示了比尔斯姆的位置偏向令牌。合并减轻这种偏见。因此,我们确定了汇总提供巨大好处的设置:(i)在低资源方案中,以及(ii)当重要的单词朝向句子中间时。在研究的汇总技术中,最大关注是最有效的,从而在多个文本分类任务上获得了显着的性能增长。
Pooling-based recurrent neural architectures consistently outperform their counterparts without pooling. However, the reasons for their enhanced performance are largely unexamined. In this work, we examine three commonly used pooling techniques (mean-pooling, max-pooling, and attention), and propose max-attention, a novel variant that effectively captures interactions among predictive tokens in a sentence. We find that pooling-based architectures substantially differ from their non-pooling equivalents in their learning ability and positional biases--which elucidate their performance benefits. By analyzing the gradient propagation, we discover that pooling facilitates better gradient flow compared to BiLSTMs. Further, we expose how BiLSTMs are positionally biased towards tokens in the beginning and the end of a sequence. Pooling alleviates such biases. Consequently, we identify settings where pooling offers large benefits: (i) in low resource scenarios, and (ii) when important words lie towards the middle of the sentence. Among the pooling techniques studied, max-attention is the most effective, resulting in significant performance gains on several text classification tasks.