Banach空间中最佳方差减少随机近似

论文标题

Banach空间中最佳方差减少随机近似

Optimal variance-reduced stochastic approximation in Banach spaces

论文作者

Mou, Wenlong, Khamaru, Koulik, Wainwright, Martin J., Bartlett, Peter L., Jordan, Michael I.

论文摘要

我们研究了估计在可分开的Banach空间上定义的承包操作员的固定点的问题。着眼于提供操作员嘈杂评估的随机查询模型，我们分析了降低方差的随机近似方案，并为操作员缺陷和估计误差建立了非扰动界限，并以任意半合理测量。与最坏情况的保证相反，我们的边界是依赖实例的，并且非质子上的局部渐近降低风险。对于线性操作员，可以将合同性放松到多步合并性，以便将理论应用于加强学习中的平均奖励政策评估问题等问题。我们通过应用程序说明了该理论，用于随机最短路径问题，两人零和马尔可夫游戏，以及策略评估和$ q $ - 对表格马尔可夫决策过程的学习。

We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contrast to worst-case guarantees, our bounds are instance-dependent, and achieve the local asymptotic minimax risk non-asymptotically. For linear operators, contractivity can be relaxed to multi-step contractivity, so that the theory can be applied to problems like average reward policy evaluation problem in reinforcement learning. We illustrate the theory via applications to stochastic shortest path problems, two-player zero-sum Markov games, as well as policy evaluation and $Q$-learning for tabular Markov decision processes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题