论文标题

GLU变体改善变压器

GLU Variants Improve Transformer

论文作者

Shazeer, Noam

论文摘要

门控线性单元(ARXIV:1612.08083)由两个线性投影的组件产物组成,其中之一首先通过Sigmoid函数。使用不同的非线性(甚至线性)函数代替Sigmoid,GLU上的变化是可能的。我们在变压器(Arxiv:1706.03762)序列模型的馈电式子层中测试了这些变体,并发现其中一些会比典型使用的Relu或Gelu激活产生质量改进。

Gated Linear Units (arXiv:1612.08083) consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function. Variations on GLU are possible, using different nonlinear (or even linear) functions in place of sigmoid. We test these variants in the feed-forward sublayers of the Transformer (arXiv:1706.03762) sequence-to-sequence model, and find that some of them yield quality improvements over the typically-used ReLU or GELU activations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源