调查边缘化在贝叶斯计算中离散参数的效率

论文标题

调查边缘化在贝叶斯计算中离散参数的效率

Investigating the efficiency of marginalising over discrete parameters in Bayesian computations

论文作者

Zhang, Wen, Pullin, Jeffrey, Gurrin, Lyle, Vukcevic, Damjan

论文摘要

贝叶斯分析方法通常使用某种形式的迭代模拟，例如蒙特卡洛计算。涉及离散变量的模型有时会构成挑战，因为所使用的方法不支持此类变量（例如，汉密尔顿蒙特卡洛），或者因为这种变量的存在可以减慢计算的速度。一个常见的解决方法是将模型中离散变量边缘化。虽然可以合理地期望这种边缘化也会导致更高的计算，但据我们所知，这尚未在几个专业模型之外得到证明。我们探讨了边缘化对一些简单统计模型的计算效率的影响。具体而言，我们考虑了两分和三成分的高斯混合模型，也考虑了用于分类评级的Dawid-Skene模型。我们通过马尔可夫链蒙特卡洛技术的两个软件实现进行了探索：jags and stan。我们直接使用同一软件上的采样器直接比较了同一模型的边缘化和非分支版本。我们的结果表明，边缘化本身并不一定会提高性能。然而，最好的表现通常是通过Stan实现的，这需要边缘化。我们得出的结论是，边缘化是否有帮助没有简单的答案。不一定的情况是，在“打开”时，可以确保该技术可以提供独立于其他因素的计算益处，也不可能是对计算效率产生最大影响的模型组件。

Bayesian analysis methods often use some form of iterative simulation such as Monte Carlo computation. Models that involve discrete variables can sometime pose a challenge, either because the methods used do not support such variables (e.g. Hamiltonian Monte Carlo) or because the presence of such variables can slow down the computation. A common workaround is to marginalise the discrete variables out of the model. While it is reasonable to expect that such marginalisation would also lead to more time-efficient computations, to our knowledge this has not been demonstrated beyond a few specialised models. We explored the impact of marginalisation on the computational efficiency for a few simple statistical models. Specifically, we considered two- and three-component Gaussian mixture models, and also the Dawid-Skene model for categorical ratings. We explored each with two software implementations of Markov chain Monte Carlo techniques: JAGS and Stan. We directly compared marginalised and non-marginalised versions of the same model using the samplers on the same software. Our results show that marginalisation on its own does not necessarily boost performance. Nevertheless, the best performance was usually achieved with Stan, which requires marginalisation. We conclude that there is no simple answer to whether or not marginalisation is helpful. It is not necessarily the case that, when turned 'on', this technique can be assured to provide computational benefit independent of other factors, nor is it likely to be the model component that has the largest impact on computational efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题