强化学习的本地解释

论文标题

强化学习的本地解释

Local Explanations for Reinforcement Learning

论文作者

Luss, Ronny, Dhurandhar, Amit, Liu, Miao

论文摘要

可解释的AI中的许多作品都专注于解释黑盒分类模型。以域用户可以理解的方式来解释深厚的增强学习（RL）政策，人们的关注较少。在本文中，我们提出了一种新的观点，以理解基于从自动学习的元国家中识别重要状态的RL政策。我们的方法与以前的许多方法之间的关键概念差异是，我们基于由专家政策动态而不是基于行动相似性的地方形成元国家，并且我们不假定对状态空间的基本拓扑的任何特定知识。从理论上讲，我们表明我们的算法要查找元态收敛，并且从每个元状态中选择重要状态的目标是下层的，导致有效的高质量贪婪选择。在四个领域（四个房间，门钥匙，漫画和乒乓球）上进行的实验以及一项精心进行的用户研究表明，我们的观点可以更好地理解政策。我们猜想这是我们的元国家更加直观的结果，因为相应的重要状态是可拖动的中间目标的有力指标，这些目标对人类更容易解释和遵循。

Many works in explainable AI have focused on explaining black-box classification models. Explaining deep reinforcement learning (RL) policies in a manner that could be understood by domain users has received much less attention. In this paper, we propose a novel perspective to understanding RL policies based on identifying important states from automatically learned meta-states. The key conceptual difference between our approach and many previous ones is that we form meta-states based on locality governed by the expert policy dynamics rather than based on similarity of actions, and that we do not assume any particular knowledge of the underlying topology of the state space. Theoretically, we show that our algorithm to find meta-states converges and the objective that selects important states from each meta-state is submodular leading to efficient high quality greedy selection. Experiments on four domains (four rooms, door-key, minipacman, and pong) and a carefully conducted user study illustrate that our perspective leads to better understanding of the policy. We conjecture that this is a result of our meta-states being more intuitive in that the corresponding important states are strong indicators of tractable intermediate goals that are easier for humans to interpret and follow.

下载PDF全文

下载文献需遵守相关版权规定

论文标题