马尔可夫基金会用于准故事近似，并应用于超级寻求控制

论文标题

马尔可夫基金会用于准故事近似，并应用于超级寻求控制

Markovian Foundations for Quasi-Stochastic Approximation with Applications to Extremum Seeking Control

论文作者

Lauand, Caio Kalil, Meyn, Sean

论文摘要

本文涉及准故事近似（QSA），以解决在优化和强化学习应用中常见发现的根问题。一般常数增益算法可以表示为时间为inthysi-ode ode $ \ frac {d} {dt}θ_t=αf_t（θ_t）$，带有状态进程$θ$在$ \ mathbb {r}^d $上演变。理论基于几乎周期性的矢量字段，因此特别是$ f_t（θ）$的时间平均时间定义了时间合并的平均值矢量字段$ \ bar {f} \ colon \ colon \ colon \ Mathbb {r}^d \ to \ \ mathbb {r Mathbb {r}^d $ with $ \ bar vith $ \ bar {f}（f} f}（f}^u}（tin）= 0 $。在对所涉及功能的平稳性假设下，获得以下确切表示： \ [\ frac {d} {dt}θ_t=α[\ bar {f}（θ_t） - α\barυ_t+α^2 \ mathcal {w} _t _t^0+α \ frac {d} {dt} \ mathcal {w} _t^1+\ frac {d^2} {dt^2} \ mathcal {w} _t _t^2] \] \] \]以及用于平滑信号的公式$ \ {\barυ_t，\ mathcal {w} _t^i：i = 0，1，2 \} $。这种新的表示，结合了最终界限的新条件，在推动QSA及其应用理论的许多应用程序中，包括本文中产生的以下含义：（i）证明估计误差$ \ |> | t-t-θ^*\ | $是订单$ O（α）$的证明，但可以使用二阶线性滤波器将$ O（α）$的订单$ O（α^2）$简化为$ O（α^2）$。（ii）在应用于超级寻求控制的应用中，发现结果不适用，因为标准算法不是Lipschitz的连续。提出了一种新的方法，以确保所需的Lipschitz边界保持，从中我们获得了稳定性，瞬时和渐近偏差$ O（α^2）$以及订单$ O（α^4）$的渐近方差。（iii）通常，在有马尔可夫噪声时，在传统随机近似中误差的界限通常更好。

This paper concerns quasi-stochastic approximation (QSA) to solve root finding problems commonly found in applications to optimization and reinforcement learning. The general constant gain algorithm may be expressed as the time-inhomogeneous ODE $ \frac{d}{dt}Θ_t=αf_t (Θ_t)$, with state process $Θ$ evolving on $\mathbb{R}^d$. Theory is based on an almost periodic vector field, so that in particular the time average of $f_t(θ)$ defines the time-homogeneous mean vector field $\bar{f} \colon \mathbb{R}^d \to \mathbb{R}^d$ with $\bar{f}(θ^*)=0$. Under smoothness assumptions on the functions involved, the following exact representation is obtained: \[\frac{d}{dt}Θ_t=α[\bar{f}(Θ_t)-α\barΥ_t+α^2\mathcal{W}_t^0+α\frac{d}{dt}\mathcal{W}_t^1+\frac{d^2}{dt^2}\mathcal{W}_t^2]\] along with formulae for the smooth signals $\{\bar Υ_t , \mathcal{W}_t^i : i=0, 1, 2\}$. This new representation, combined with new conditions for ultimate boundedness, has many applications for furthering the theory of QSA and its applications, including the following implications that are developed in this paper: (i) A proof that the estimation error $\|Θ_t-θ^*\|$ is of order $O(α)$, but can be reduced to $O(α^2)$ using a second order linear filter. (ii) In application to extremum seeking control, it is found that the results do not apply because the standard algorithms are not Lipschitz continuous. A new approach is presented to ensure that the required Lipschitz bounds hold, and from this we obtain stability, transient bounds, and asymptotic bias of order $O(α^2)$, and asymptotic variance of order $O(α^4)$. (iii) It is in general possible to obtain better than $O(α)$ bounds on error in traditional stochastic approximation when there is Markovian noise.

下载PDF全文

下载文献需遵守相关版权规定

论文标题