C学习：地平线感知累积可访问性估计

论文标题

C学习：地平线感知累积可访问性估计

C-Learning: Horizon-Aware Cumulative Accessibility Estimation

论文作者

Naderian, Panteha, Loaiza-Ganem, Gabriel, Braviner, Harry J., Caterini, Anthony L., Cresswell, Jesse C., Li, Tong, Garg, Animesh

论文摘要

多进球范围是实现算法概括所需的强化学习的重要问题。尽管该领域最近的进步，但当前的算法遇到了三个主要挑战：样本复杂性很高，仅学习一种实现目标的方法以及解决复杂的运动计划任务的困难。为了解决这些局限性，我们介绍了累积可访问性功能的概念，该功能衡量了从指定视野内给定状态的目标的达到性。我们表明，这些功能遵守复发关系，从而可以从离线互动中学习。我们还证明，最佳的累积可访问性功能在计划视野中是单调的。此外，我们的方法可以根据所提供的视野提出多个目标，通过为单个目标提出多个途径，在进球方面进行速度和可靠性。我们在一组多球离散和连续的控制任务上评估我们的方法。我们表明，我们的方法在成功率，样本复杂性和最优性方面优于最先进的目标算法。我们的代码可在https://github.com/layer6ai-labs/cae上找到，可以在https://sites.google.com/view/learning-cae/上找到其他可视化。

Multi-goal reaching is an important problem in reinforcement learning needed to achieve algorithmic generalization. Despite recent advances in this field, current algorithms suffer from three major challenges: high sample complexity, learning only a single way of reaching the goals, and difficulties in solving complex motion planning tasks. In order to address these limitations, we introduce the concept of cumulative accessibility functions, which measure the reachability of a goal from a given state within a specified horizon. We show that these functions obey a recurrence relation, which enables learning from offline interactions. We also prove that optimal cumulative accessibility functions are monotonic in the planning horizon. Additionally, our method can trade off speed and reliability in goal-reaching by suggesting multiple paths to a single goal depending on the provided horizon. We evaluate our approach on a set of multi-goal discrete and continuous control tasks. We show that our method outperforms state-of-the-art goal-reaching algorithms in success rate, sample complexity, and path optimality. Our code is available at https://github.com/layer6ai-labs/CAE, and additional visualizations can be found at https://sites.google.com/view/learning-cae/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题