具有生成模型的可证明的多目标增强学习

论文标题

具有生成模型的可证明的多目标增强学习

Provable Multi-Objective Reinforcement Learning with Generative Models

论文作者

Zhou, Dongruo, Chen, Jiahao, Gu, Quanquan

论文摘要

多目标增强学习（MORL）是普通的单目标增强学习（RL）的扩展，该扩展适用于许多现实世界任务，在没有已知相对成本的情况下存在多个目标。我们研究了单个政策莫尔的问题，鉴于目标的偏好，它将学习最佳政策。现有方法需要强烈的假设，例如对多目标马尔可夫决策过程的确切知识，并以无限数据和时间的限制进行分析。我们提出了一种称为基于模型的信封价值迭代（EVI）的新算法，该算法概括了Yang等人，2019年的包络多目标$ Q $ - 学习算法。我们的方法可以学习具有多项式样本复杂性和线性融合速度的近乎最佳的价值功能。据我们所知，这是MORL算法的第一个有限样本分析。

Multi-objective reinforcement learning (MORL) is an extension of ordinary, single-objective reinforcement learning (RL) that is applicable to many real-world tasks where multiple objectives exist without known relative costs. We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives. Existing methods require strong assumptions such as exact knowledge of the multi-objective Markov decision process, and are analyzed in the limit of infinite data and time. We propose a new algorithm called model-based envelop value iteration (EVI), which generalizes the enveloped multi-objective $Q$-learning algorithm in Yang et al., 2019. Our method can learn a near-optimal value function with polynomial sample complexity and linear convergence speed. To the best of our knowledge, this is the first finite-sample analysis of MORL algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题