在连续或大型离散动作空间中计划计划的边际实用程序

论文标题

在连续或大型离散动作空间中计划计划的边际实用程序

Marginal Utility for Planning in Continuous or Large Discrete Action Spaces

论文作者

Ahmad, Zaheen Farraz, Lelis, Levi H. S., Bowling, Michael

论文摘要

基于样本的计划是一种有力的算法系列，用于从环境模型中产生智能行为。产生良好的候选动作对于基于样本的计划者的成功至关重要，尤其是在连续或大型动作空间中。通常，候选人的行动产生耗尽了动作空间，使用领域知识，或者最近，涉及学习随机政策以提供此类搜索指导。在本文中，我们通过优化新的目标，边缘实用程序来探讨明确学习候选动作发生器。动作发生器的边缘效用衡量了动作价值的增加，而不是先前生成的动作。我们在卷发中验证了我们的方法，这是一个具有连续状态和动作空间的充满挑战的随机域，以及具有离散但动作空间较大的位置游戏。我们表明，经过边缘实用目标训练的发电机优于基于实质性领域知识，训练有素的随机策略以及其他自然目标的手工编码方案，用于为基于采样的规划人员制定动作。

Sample-based planning is a powerful family of algorithms for generating intelligent behavior from a model of the environment. Generating good candidate actions is critical to the success of sample-based planners, particularly in continuous or large action spaces. Typically, candidate action generation exhausts the action space, uses domain knowledge, or more recently, involves learning a stochastic policy to provide such search guidance. In this paper we explore explicitly learning a candidate action generator by optimizing a novel objective, marginal utility. The marginal utility of an action generator measures the increase in value of an action over previously generated actions. We validate our approach in both curling, a challenging stochastic domain with continuous state and action spaces, and a location game with a discrete but large action space. We show that a generator trained with the marginal utility objective outperforms hand-coded schemes built on substantial domain knowledge, trained stochastic policies, and other natural objectives for generating actions for sampled-based planners.

下载PDF全文

下载文献需遵守相关版权规定

论文标题