通过组合动作进行加固学习：车辆路线的应用

论文标题

通过组合动作进行加固学习：车辆路线的应用

Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing

论文作者

Delarue, Arthur, Anderson, Ross, Tjandraatmadja, Christian

论文摘要

长期以来，基于价值功能的方法在增强学习中发挥了重要作用。但是，当动作空间太大而无法枚举时，在任意复杂性的价值函数下找到最佳的下一步动作是不平凡的。我们通过组合动作空间开发了一个基于价值功能的深度强化学习的框架，其中动作选择问题被明确提出为混合成员优化问题。作为一个激励的例子，我们将该框架的应用在电容的车辆路由问题（CVRP）中，这是一个组合优化问题，其中一组位置必须由容量有限的单个车辆覆盖。在每个实例上，我们将动作作为单个路线的构建建模，并考虑通过简单的策略迭代算法改进的确定性政策。我们的方法与其他强化学习方法具有竞争力，并且在中等规模的标准库实例上，平均差距为1.7％。

Value-function-based methods have long played an important role in reinforcement learning. However, finding the best next action given a value function of arbitrary complexity is nontrivial when the action space is too large for enumeration. We develop a framework for value-function-based deep reinforcement learning with a combinatorial action space, in which the action selection problem is explicitly formulated as a mixed-integer optimization problem. As a motivating example, we present an application of this framework to the capacitated vehicle routing problem (CVRP), a combinatorial optimization problem in which a set of locations must be covered by a single vehicle with limited capacity. On each instance, we model an action as the construction of a single route, and consider a deterministic policy which is improved through a simple policy iteration algorithm. Our approach is competitive with other reinforcement learning methods and achieves an average gap of 1.7% with state-of-the-art OR methods on standard library instances of medium size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题