如果您将自回归的生成模型视为一个基于能量的生成模型

论文标题

如果您将自回归的生成模型视为一个基于能量的生成模型

Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One

论文作者

Wang, Yezhen, Che, Tong, Li, Bo, Song, Kaitao, Pei, Hengzhi, Bengio, Yoshua, Li, Dongsheng

论文摘要

通常使用自回旋生成模型，尤其是对于涉及顺序数据的那些任务。但是，由于链式有条件建模的内在特征（例如，暴露偏见或缺乏远距离连贯性），它们受到了许多固有的缺陷的困扰，严重限制了它们正确模型分布的能力。在本文中，我们提出了一种独特的方法，该方法称为训练自回归生成模型的电子臂，以利用精心设计的基于能量的学习目标。通过利用SoftMax操作的额外自由度，我们可以使自回归模型本身成为基于能量的模型，用于测量输入的可能性，而无需引入任何额外的参数。此外，我们表明可以有效地训练电子臂，并能够减轻暴露偏见问题并增加自回归生成模型的时间连贯性。广泛的经验结果涵盖了语言建模，神经机器翻译和图像产生等基准，证明了所提出的方法的有效性。

Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic characteristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), severely limiting their ability to model distributions properly. In this paper, we propose a unique method termed E-ARM for training autoregressive generative models that takes advantage of a well-designed energy-based learning objective. By leveraging the extra degree of freedom of the softmax operation, we are allowed to make the autoregressive model itself be an energy-based model for measuring the likelihood of input without introducing any extra parameters. Furthermore, we show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem and increase temporal coherence for autoregressive generative models. Extensive empirical results, covering benchmarks like language modeling, neural machine translation, and image generation, demonstrate the effectiveness of the proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题