具有二进制潜在的变异自动编码器的直接进化优化

论文标题

具有二进制潜在的变异自动编码器的直接进化优化

Direct Evolutionary Optimization of Variational Autoencoders With Binary Latents

论文作者

Guiraud, Enrico, Drefs, Jakob, Lücke, Jörg

论文摘要

离散的潜在变量对现实世界数据很重要，这激发了对具有离散潜伏期的变异自动编码器（VAE）的研究。但是，在这种情况下，不可能进行标准的VAE培训，该培训促使不同的策略操纵离散分布，以便训练离散VAE与传统的VAE相似。在这里，我们询问是否也可以通过对编码模型应用直接离散优化来保持潜伏期的离散性质。因此，通过避开采样近似，重新聚集技巧和摊销，该方法从标准的VAE训练中强烈转移。离散优化是在使用截短的后期与进化算法结合使用的变异设置中实现的。对于具有二进制潜在的VAE，我们（a）显示了这种离散的变分方法如何与网络权重的梯度上升联系在一起，以及（b）解码器如何用于选择用于培训的潜在状态。常规的摊销训练更有效，适用于大型神经网络。但是，使用较小的网络，我们在这里发现直接离散优化可以有效地扩展到数百个潜伏期。更重要的是，我们发现直接优化的有效性在“零击”学习中具有很高的竞争力。与大型监督网络相反，此处调查的VAE可以，例如，在没有先前对大型图像数据集上进行清洁数据和/或培训的培训的情况下，将单个图像进行了Denoise。更普遍地，研究的方法表明，在没有基于取样的近似和重新聚集化的情况下，对VAE的培训确实是可能的，这对于一般的VAE训练分析可能很有趣。此外，对于“零击”设置，直接优化使VAE具有竞争力，而不是以前用非基因的方法表现出色。

Discrete latent variables are considered important for real world data, which has motivated research on Variational Autoencoders (VAEs) with discrete latents. However, standard VAE training is not possible in this case, which has motivated different strategies to manipulate discrete distributions in order to train discrete VAEs similarly to conventional ones. Here we ask if it is also possible to keep the discrete nature of the latents fully intact by applying a direct discrete optimization for the encoding model. The approach is consequently strongly diverting from standard VAE-training by sidestepping sampling approximation, reparameterization trick and amortization. Discrete optimization is realized in a variational setting using truncated posteriors in conjunction with evolutionary algorithms. For VAEs with binary latents, we (A) show how such a discrete variational method ties into gradient ascent for network weights, and (B) how the decoder is used to select latent states for training. Conventional amortized training is more efficient and applicable to large neural networks. However, using smaller networks, we here find direct discrete optimization to be efficiently scalable to hundreds of latents. More importantly, we find the effectiveness of direct optimization to be highly competitive in `zero-shot' learning. In contrast to large supervised networks, the here investigated VAEs can, e.g., denoise a single image without previous training on clean data and/or training on large image datasets. More generally, the studied approach shows that training of VAEs is indeed possible without sampling-based approximation and reparameterization, which may be interesting for the analysis of VAE-training in general. For `zero-shot' settings a direct optimization, furthermore, makes VAEs competitive where they have previously been outperformed by non-generative approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题