论文标题
积极的加强学习:以一定的代价观察恢复
Active Reinforcement Learning: Observing Rewards at a Cost
论文作者
论文摘要
主动加强学习(ARL)是强化学习的一种变体,除非选择支付查询成本c> 0,否则代理人不会观察奖励。ARL的核心问题是如何量化奖励信息的长期价值。即使在多臂匪徒中,计算此信息的价值也很棘手,我们也必须依靠启发式方法。我们提出并评估了多臂匪徒和(表格)马尔可夫决策过程中ARL的几种启发式方法,并讨论和说明ARL问题的一些具有挑战性的方面。
Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0. The central question of ARL is how to quantify the long-term value of reward information. Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes, and discuss and illustrate some challenging aspects of the ARL problem.