积极的加强学习：以一定的代价观察恢复

论文标题

积极的加强学习：以一定的代价观察恢复

Active Reinforcement Learning: Observing Rewards at a Cost

论文作者

Krueger, David, Leike, Jan, Evans, Owain, Salvatier, John

论文摘要

主动加强学习（ARL）是强化学习的一种变体，除非选择支付查询成本c> 0，否则代理人不会观察奖励。ARL的核心问题是如何量化奖励信息的长期价值。即使在多臂匪徒中，计算此信息的价值也很棘手，我们也必须依靠启发式方法。我们提出并评估了多臂匪徒和（表格）马尔可夫决策过程中ARL的几种启发式方法，并讨论和说明ARL问题的一些具有挑战性的方面。

Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0. The central question of ARL is how to quantify the long-term value of reward information. Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes, and discuss and illustrate some challenging aspects of the ARL problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题