论文标题
Autotrans:通过增强体系结构搜索自动化变压器设计
AutoTrans: Automating Transformer Design via Reinforced Architecture Search
论文作者
论文摘要
尽管变压器体系结构在许多自然语言理解任务中表现出统治地位,但对于变压器模型的培训仍然存在尚未解决的问题,尤其是对有原则的热身方式的需求,这表明对变压器的稳定培训表现出了重要的重要性,以及在手头的任务是否更喜欢扩展注意力产品。在本文中,我们从经验上探索了在变压器模型中自动化设计选择的自动化,即如何设置图层 - 标准,是否缩放,是否扩展,层数,头部数,激活功能等,以便可以获得更适合手头任务的变压器体系结构。 RL用于沿着搜索空间导航,特殊的参数共享策略旨在加速搜索。结果表明,在搜索过程中,对每个时期的培训数据进行采样比例,以提高搜索质量。 Conll03,Multi-30K,IWSLT14和WMT-14上的实验表明,搜索的变压器模型可以胜过标准变压器。特别是,我们表明,可以在没有热身的情况下,可以通过大量学习率对我们学习的模型进行更强大的培训。
Though the transformer architectures have shown dominance in many natural language understanding tasks, there are still unsolved issues for the training of transformer models, especially the need for a principled way of warm-up which has shown importance for stable training of a transformer, as well as whether the task at hand prefer to scale the attention product or not. In this paper, we empirically explore automating the design choices in the transformer model, i.e., how to set layer-norm, whether to scale, number of layers, number of heads, activation function, etc, so that one can obtain a transformer architecture that better suits the tasks at hand. RL is employed to navigate along search space, and special parameter sharing strategies are designed to accelerate the search. It is shown that sampling a proportion of training data per epoch during search help to improve the search quality. Experiments on the CoNLL03, Multi-30k, IWSLT14 and WMT-14 shows that the searched transformer model can outperform the standard transformers. In particular, we show that our learned model can be trained more robustly with large learning rates without warm-up.