统治所有这些的一种风险：对基于模型的离线增强学习的风险敏感观点

论文标题

统治所有这些的一种风险：对基于模型的离线增强学习的风险敏感观点

One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning

论文作者

Rigter, Marc, Lacerda, Bruno, Hawes, Nick

论文摘要

离线加固学习（RL）适用于在线探索太高或危险的关键领域。在这种关键安全环境中，决策应考虑到灾难性结果的风险。换句话说，决策应该对风险敏感。先前在离线RL风险上的工作将离线RL技术组合在一起，以避免分配转移以及风险敏感的RL算法，以实现风险敏感性。在这项工作中，我们提出了风险敏感性，作为共同解决这两个问题的机制。我们基于模型的方法对认知和核心不确定性既风险风险。对认知不确定性的风险规范会阻止分布变化，因为数据集未涵盖的区域具有很高的认知不确定性。对态度不确定性的风险规避会阻碍因环境随机性而导致结果不佳的行动。我们的实验表明，我们的算法在确定性的基准上实现了竞争性能，并且在随机域中的风险敏感目标表现优于现有方法。

Offline reinforcement learning (RL) is suitable for safety-critical domains where online exploration is too costly or dangerous. In such safety-critical settings, decision-making should take into consideration the risk of catastrophic outcomes. In other words, decision-making should be risk-sensitive. Previous works on risk in offline RL combine together offline RL techniques, to avoid distributional shift, with risk-sensitive RL algorithms, to achieve risk-sensitivity. In this work, we propose risk-sensitivity as a mechanism to jointly address both of these issues. Our model-based approach is risk-averse to both epistemic and aleatoric uncertainty. Risk-aversion to epistemic uncertainty prevents distributional shift, as areas not covered by the dataset have high epistemic uncertainty. Risk-aversion to aleatoric uncertainty discourages actions that may result in poor outcomes due to environment stochasticity. Our experiments show that our algorithm achieves competitive performance on deterministic benchmarks, and outperforms existing approaches for risk-sensitive objectives in stochastic domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题