联合随机近似及其应用于学习离散潜在变量模型

论文标题

联合随机近似及其应用于学习离散潜在变量模型

Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models

论文作者

Ou, Zhijian, Song, Yunfu

论文摘要

尽管随着引入辅助摊销推理模型的进展，学习离散的潜在变量模型仍然具有挑战性。在本文中，我们表明，为推理模型获得可靠的随机梯度以及间接优化目标对数似然性的弊端的困难可以在基于Robbins-Monro类型的随机近似（SA）理论的新方法中优雅地解决。具体而言，我们建议直接最大化目标对数可能性，并同时最大程度地减少后验和推理模型之间的包含差异。由此产生的学习算法称为联合SA（JSA）。据我们所知，JSA代表了第一种与自适应MCMC程序相结合的EM（期望最大化）算法（SAEM）的SA版本的方法。在几种基准生成建模和结构化预测任务上进行的实验表明，JSA的表现始终优于最近的竞争算法，具有更快的收敛性，更好的最终可能性和较低的梯度估计差异。

Although with progress in introducing auxiliary amortized inference models, learning discrete latent variable models is still challenging. In this paper, we show that the annoying difficulty of obtaining reliable stochastic gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed in a new method based on stochastic approximation (SA) theory of the Robbins-Monro type. Specifically, we propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model. The resulting learning algorithm is called joint SA (JSA). To the best of our knowledge, JSA represents the first method that couples an SA version of the EM (expectation-maximization) algorithm (SAEM) with an adaptive MCMC procedure. Experiments on several benchmark generative modeling and structured prediction tasks show that JSA consistently outperforms recent competitive algorithms, with faster convergence, better final likelihoods, and lower variance of gradient estimates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题