论文标题
使用VAE的语法和语义的无监督分离,在变压器中利用电感偏差
Exploiting Inductive Bias in Transformers for Unsupervised Disentanglement of Syntax and Semantics with VAEs
论文作者
论文摘要
我们提出了一种用于文本生成的生成模型,该模型表现出语法和语义的分离潜在表示。与以前的工作相反,该模型不需要句法信息,例如选区解析或语义信息,例如释义对。我们的模型仅依赖于基于注意力的架构(例如变压器)中发现的电感偏差。 在变压器的注意力下,键处理信息选择,而值则指定传达的信息。我们的模型被称为QKVAE,在其解码器中使用注意力来读取潜在变量,其中一个潜在变量会输入键,而另一个潜在的变量会播放键。我们在语法/语义转移的潜在表示和实验上进行了实验,这些实验表明QKVAE显示了清晰的语法和语义的明确迹象。我们还表明,与监督模型相比,我们的模型显示了竞争性语法传输功能,并且可比较的监督模型需要大量数据(超过50k样本),以在句法和语义传输上均优于其表现。我们的实验代码公开可用。
We propose a generative model for text generation, which exhibits disentangled latent representations of syntax and semantics. Contrary to previous work, this model does not need syntactic information such as constituency parses, or semantic information such as paraphrase pairs. Our model relies solely on the inductive bias found in attention-based architectures such as Transformers. In the attention of Transformers, keys handle information selection while values specify what information is conveyed. Our model, dubbed QKVAE, uses Attention in its decoder to read latent variables where one latent variable infers keys while another infers values. We run experiments on latent representations and experiments on syntax/semantics transfer which show that QKVAE displays clear signs of disentangled syntax and semantics. We also show that our model displays competitive syntax transfer capabilities when compared to supervised models and that comparable supervised models need a fairly large amount of data (more than 50K samples) to outperform it on both syntactic and semantic transfer. The code for our experiments is publicly available.