结合加强学习和约束编程以进行组合优化

论文标题

结合加强学习和约束编程以进行组合优化

Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

论文作者

Cappart, Quentin, Moisan, Thierry, Rousseau, Louis-Martin, Prémont-Schwarz, Isabeau, Cire, Andre

论文摘要

组合优化已在从航空航天到运输规划和经济学的许多领域中发现了应用。目的是在有限的可能性中找到最佳解决方案。一个众所周知的挑战是一个具有组合优化的面临的挑战是状态空间爆炸问题：随着问题的规模，可能性的数量呈指数增长，这使得解决了大问题。在过去的几年中，深度强化学习（DRL）表现出了其设计良好的启发式方法的希望，该启发式方法致力于解决NP-HARD组合优化问题。但是，当前的方法有两个缺点：（1）它们主要关注标准的旅行者问题，并且不能轻易地扩展到其他问题，并且（2）他们仅提供一个近似解决方案，而没有系统的方法来改善它或证明最佳性。在另一种情况下，约束编程（CP）是解决组合优化问题的通用工具。基于完整的搜索过程，如果我们允许执行时间足够大，它将始终找到最佳解决方案。一个关键的设计选择，使CP在实践中使用，是分支决定，指导搜索空间的探索方式。在这项工作中，我们提出了一种基于DRL和CP的一般和混合方法，用于解决组合优化问题。我们方法的核心是基于动态编程公式，该公式充当两种技术之间的桥梁。我们在实验上表明，我们的求解器有效地解决了两个具有挑战性的问题：旅行推销员与时窗的问题以及4摩变的投资组合优化问题。结果表明，该框架引入的表现优于独立的RL和CP解决方案，同时与工业求解器竞争。

Combinatorial optimization has found applications in numerous fields, from aerospace to transportation planning and economics. The goal is to find an optimal solution among a finite set of possibilities. The well-known challenge one faces with combinatorial optimization is the state-space explosion problem: the number of possibilities grows exponentially with the problem size, which makes solving intractable for large problems. In the last years, deep reinforcement learning (DRL) has shown its promise for designing good heuristics dedicated to solve NP-hard combinatorial optimization problems. However, current approaches have two shortcomings: (1) they mainly focus on the standard travelling salesman problem and they cannot be easily extended to other problems, and (2) they only provide an approximate solution with no systematic ways to improve it or to prove optimality. In another context, constraint programming (CP) is a generic tool to solve combinatorial optimization problems. Based on a complete search procedure, it will always find the optimal solution if we allow an execution time large enough. A critical design choice, that makes CP non-trivial to use in practice, is the branching decision, directing how the search space is explored. In this work, we propose a general and hybrid approach, based on DRL and CP, for solving combinatorial optimization problems. The core of our approach is based on a dynamic programming formulation, that acts as a bridge between both techniques. We experimentally show that our solver is efficient to solve two challenging problems: the traveling salesman problem with time windows, and the 4-moments portfolio optimization problem. Results obtained show that the framework introduced outperforms the stand-alone RL and CP solutions, while being competitive with industrial solvers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题