论文标题

使模拟退火样品有效以进行离散随机优化

Making Simulated Annealing Sample Efficient for Discrete Stochastic Optimization

论文作者

Shah, Suhail M.

论文摘要

我们研究了基于模拟退火的遗憾(SA)解决离散随机优化问题的方法。理论上的主要结论是,模拟退火算法的遗憾(嘈杂或无嘈杂的观察结果)主要取决于相关Gibbs衡量与最佳状态的收敛速率。与以前的作品相反,我们表明SA不需要增加估算工作(\ textit {pulls/samples}的每回合\ textit {arm/solution}的每回合,以获得有限的地平线$ n $),并且观察值会收敛于概率。通过简单的修改,我们可以将收敛所需的样本总数(概率)缩放为$ \ Mathcal {o} \ big(n)$。此外,我们表明,模拟的退火启发的启发式可以解决随机多臂匪徒(MAB)的问题,我们的意思是它的意思是它遭受了$ \ Mathcal {o}(\ log \,n)$遗憾。因此,我们的论点是,SA应被视为将有效勘探启发式启发式启发式术和离散随机优化问题纳入有效探索启发式的可行候选人。

We study the regret of simulated annealing (SA) based approaches to solving discrete stochastic optimization problems. The main theoretical conclusion is that the regret of the simulated annealing algorithm, with either noisy or noiseless observations, depends primarily upon the rate of the convergence of the associated Gibbs measure to the optimal states. In contrast to previous works, we show that SA does not need an increased estimation effort (number of \textit{pulls/samples} of the selected \textit{arm/solution} per round for a finite horizon $n$) with noisy observations to converge in probability. By simple modifications, we can make the total number of samples per iteration required for convergence (in probability) to scale as $\mathcal{O}\big(n)$. Additionally, we show that a simulated annealing inspired heuristic can solve the problem of stochastic multi-armed bandits (MAB), by which we mean that it suffers a $\mathcal{O}(\log \,n)$ regret. Thus, our contention is that SA should be considered as a viable candidate for inclusion into the family of efficient exploration heuristics for bandit and discrete stochastic optimization problems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源