风险敏感的增强学习：一种奖励不确定性的方法

论文标题

风险敏感的增强学习：一种奖励不确定性的方法

Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward Uncertainty

论文作者

Vadori, Nelson, Ganesh, Sumitra, Reddy, Prashant, Veloso, Manuela

论文摘要

我们引入了一个新颖的框架，以说明对顺序决策问题中奖励不确定性的敏感性。迄今为止，对马尔可夫决策过程的风险敏感制定侧重于整体累积奖励的分布，但我们旨在学习对奖励的不确定/随机性质敏感的政策，这在某些情况下具有概念上更有意义的优势。为此，我们介绍了基于随机过程的DOOB分解中包含的随机性的新分解，并引入了一种新的概念工具-Textit {Chaotic变化} - 可以严格地将其解释为与累积奖励过程相关的Martingale组件的风险度量。我们通过将这种新的对风险敏感的方法纳入基于策略梯度和基于价值功能的无模型算法，并说明其对网格世界和投资组合优化问题的相关性，从而在强化学习方面进行创新。

We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as a whole, we aim at learning policies sensitive to the uncertain/stochastic nature of the rewards, which has the advantage of being conceptually more meaningful in some cases. To this end, we present a new decomposition of the randomness contained in the cumulative reward based on the Doob decomposition of a stochastic process, and introduce a new conceptual tool - the \textit{chaotic variation} - which can rigorously be interpreted as the risk measure of the martingale component associated to the cumulative reward process. We innovate on the reinforcement learning side by incorporating this new risk-sensitive approach into model-free algorithms, both policy gradient and value function based, and illustrate its relevance on grid world and portfolio optimization problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题