基于α-差异最小化的基于无限的基于梯度的下降

论文标题

基于α-差异最小化的基于无限的基于梯度的下降

Infinite-dimensional gradient-based descent for alpha-divergence minimisation

论文作者

Daudel, Kamélia, Douc, Randal, Portier, François

论文摘要

本文介绍了$（α，γ）$ - 下降，这是一种迭代算法，该算法在贝叶斯框架中以衡量标准进行操作并执行$α$ - 差异最小化。这种基于梯度的过程通过以度量的形式添加先验来扩展常用的变分近似。我们证明，对于丰富的功能$γ$的家族，该算法在每个步骤中都导致$α$ divergence的系统下降并得出收敛结果。我们的框架恢复了熵镜下降算法，并提供了一种我们称为功率下降的替代算法。此外，在其随机配方中，$（α，γ）$ - 下降允许优化任何给定混合模型的混合物权重，而无需有关变异参数的基本分布的任何信息。这使我们的方法与许多参数更新的选择兼容，并且适用于广泛的机器学习任务。我们在玩具和现实世界中均表现出凭经验证明使用功率下降并超越熵镜下降框架的好处，随着尺寸的增长，它会失败。

This paper introduces the $(α, Γ)$-descent, an iterative algorithm which operates on measures and performs $α$-divergence minimisation in a Bayesian framework. This gradient-based procedure extends the commonly-used variational approximation by adding a prior on the variational parameters in the form of a measure. We prove that for a rich family of functions $Γ$, this algorithm leads at each step to a systematic decrease in the $α$-divergence and derive convergence results. Our framework recovers the Entropic Mirror Descent algorithm and provides an alternative algorithm that we call the Power Descent. Moreover, in its stochastic formulation, the $(α, Γ)$-descent allows to optimise the mixture weights of any given mixture model without any information on the underlying distribution of the variational parameters. This renders our method compatible with many choices of parameters updates and applicable to a wide range of Machine Learning tasks. We demonstrate empirically on both toy and real-world examples the benefit of using the Power descent and going beyond the Entropic Mirror Descent framework, which fails as the dimension grows.

下载PDF全文

下载文献需遵守相关版权规定

论文标题