分析验证的变压器模型中的冗余

论文标题

分析验证的变压器模型中的冗余

Analyzing Redundancy in Pretrained Transformer Models

论文作者

Dalvi, Fahim, Sajjad, Hassan, Durrani, Nadir, Belinkov, Yonatan

论文摘要

基于变压器的深NLP模型是使用数亿个参数训练的，从而限制了其在计算受约束环境中的适用性。在本文中，我们通过定义冗余概念来研究这些局限性的原因，我们将其分为两类：一般冗余和特定于任务的冗余。我们剖析了两个流行的预预读模型Bert和XLNet，研究了它们在表示级别和更细粒度的神经元级别上表现出的冗余。我们的分析揭示了有趣的见解，例如：i）整个网络中85％的神经元是多余的，ii）当对下游任务进行优化时，至少可以删除其中92％的神经元。基于我们的分析，我们提出了一个有效的基于功能的转移学习程序，该过程在使用最初的10％的原始神经元的同时保持了97％的性能。

Transformer-based deep NLP models are trained using hundreds of millions of parameters, limiting their applicability in computationally constrained environments. In this paper, we study the cause of these limitations by defining a notion of Redundancy, which we categorize into two classes: General Redundancy and Task-specific Redundancy. We dissect two popular pretrained models, BERT and XLNet, studying how much redundancy they exhibit at a representation-level and at a more fine-grained neuron-level. Our analysis reveals interesting insights, such as: i) 85% of the neurons across the network are redundant and ii) at least 92% of them can be removed when optimizing towards a downstream task. Based on our analysis, we present an efficient feature-based transfer learning procedure, which maintains 97% performance while using at-most 10% of the original neurons.

下载PDF全文

下载文献需遵守相关版权规定

论文标题