一种优化平稳风险度量的政策梯度方法

论文标题

一种优化平稳风险度量的政策梯度方法

A policy gradient approach for optimization of smooth risk measures

论文作者

Vijayan, Nithia, A, Prashanth L.

论文摘要

我们提出了解决对风险敏感的增强学习（RL）问题的政策梯度算法，并提出了政策范围的问题。我们考虑情节的马尔可夫决策过程，并使用累积折扣奖励的平稳风险度量进行对风险进行建模。我们提出了两种模板策略梯度算法，这些算法分别优化了在政策和非政策RL设置中的平稳风险度量。我们得出了非反应界限，以量化我们提出的算法的收敛速率到平滑风险度量的固定点。作为特殊情况，我们确定我们的算法分别适用于均值和失真风险度量的优化。

We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings. We consider episodic Markov decision processes, and model the risk using the broad class of smooth risk measures of the cumulative discounted reward. We propose two template policy gradient algorithms that optimize a smooth risk measure in on-policy and off-policy RL settings, respectively. We derive non-asymptotic bounds that quantify the rate of convergence of our proposed algorithms to a stationary point of the smooth risk measure. As special cases, we establish that our algorithms apply to optimization of mean-variance and distortion risk measures, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题