重新考虑变压器组件的价值

论文标题

重新考虑变压器组件的价值

Rethinking the Value of Transformer Components

论文作者

Wang, Wenxuan, Tu, Zhaopeng

论文摘要

变形金刚成为最新的翻译模型，而虽然没有很好地研究每个中间组件如何对模型性能做出贡献，这对设计最佳体系结构构成了重大挑战。在这项工作中，我们从不同的角度评估训练有素的变压器模型中各个组件（子层）的影响来弥合这一差距。语言对，培训策略和模型能力之间的实验结果表明，某些组件始终比其他组件更重要。我们还报告了许多有趣的发现，可以帮助人类更好地分析，理解和改善变形金刚模型。基于这些观察结果，我们进一步提出了一种新的培训策略，可以通过区分培训中不重要的组成部分来改善翻译性能。

Transformer becomes the state-of-the-art translation model, while it is not well studied how each intermediate component contributes to the model performance, which poses significant challenges for designing optimal architectures. In this work, we bridge this gap by evaluating the impact of individual component (sub-layer) in trained Transformer models from different perspectives. Experimental results across language pairs, training strategies, and model capacities show that certain components are consistently more important than the others. We also report a number of interesting findings that might help humans better analyze, understand and improve Transformer models. Based on these observations, we further propose a new training strategy that can improves translation performance by distinguishing the unimportant components in training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题