论文标题

积极的加强学习:以一定的代价观察恢复

Active Reinforcement Learning: Observing Rewards at a Cost

论文作者

Krueger, David, Leike, Jan, Evans, Owain, Salvatier, John

论文摘要

主动加强学习(ARL)是强化学习的一种变体,除非选择支付查询成本c> 0,否则代理人不会观察奖励。ARL的核心问题是如何量化奖励信息的长期价值。即使在多臂匪徒中,计算此信息的价值也很棘手,我们也必须依靠启发式方法。我们提出并评估了多臂匪徒和(表格)马尔可夫决策过程中ARL的几种启发式方法,并讨论和说明ARL问题的一些具有挑战性的方面。

Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0. The central question of ARL is how to quantify the long-term value of reward information. Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes, and discuss and illustrate some challenging aspects of the ARL problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源