序列到序列VAE是否学习句子的全局特征？

论文标题

序列到序列VAE是否学习句子的全局特征？

Do sequence-to-sequence VAEs learn global features of sentences?

论文作者

Bosc, Tom, Vincent, Pascal

论文摘要

自回归语言模型功能强大且相对易于训练。但是，这些模型通常在没有明确调理标签的情况下进行培训，并且不提供控制全局方面的简单方法，例如发电期间的情感或主题。鲍曼＆al。（2016年）通过序列到序列体系结构对自然语言的变异自动编码器（VAE）进行了调整，并声称潜在向量能够以无监督的方式捕获此类全局特征。我们质疑这个主张。我们通过分解句子中每个位置的重建损失，从潜在信息中衡量哪个单词最大。使用这种方法，我们发现VAE容易记住第一个单词和句子的长度，从而产生有限有用性的本地特征。为了减轻这一点，我们根据词具的假设和语言模型进行了预处理研究替代体系结构。这些变体学习了更全局的潜在变量，即对主题或情感标签的预测。此外，使用重建，我们观察到它们减少了记忆：第一个单词和句子的长度与基线相比未准确地恢复，因此产生了更多样化的重建。

Autoregressive language models are powerful and relatively easy to train. However, these models are usually trained without explicit conditioning labels and do not offer easy ways to control global aspects such as sentiment or topic during generation. Bowman & al. (2016) adapted the Variational Autoencoder (VAE) for natural language with the sequence-to-sequence architecture and claimed that the latent vector was able to capture such global features in an unsupervised manner. We question this claim. We measure which words benefit most from the latent information by decomposing the reconstruction loss per position in the sentence. Using this method, we find that VAEs are prone to memorizing the first words and the sentence length, producing local features of limited usefulness. To alleviate this, we investigate alternative architectures based on bag-of-words assumptions and language model pretraining. These variants learn latent variables that are more global, i.e., more predictive of topic or sentiment labels. Moreover, using reconstructions, we observe that they decrease memorization: the first word and the sentence length are not recovered as accurately than with the baselines, consequently yielding more diverse reconstructions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题