加强学习解决NP障碍问题：CVRP的应用

论文标题

加强学习解决NP障碍问题：CVRP的应用

Reinforcement Learning to Solve NP-hard Problems: an Application to the CVRP

论文作者

Ardon, Leo

论文摘要

在本文中，我们评估了加固学习（RL）来解决经典组合优化问题的使用：电容的车辆路由问题（CVRP）。我们在RL框架中正式化了这个问题，并将两种最有希望的RL方法与一组基准实例中的传统求解技术进行了比较。我们通过返回的解决方案的质量以及返回该解决方案所需的时间来测量不同的方法。我们发现，尽管没有返回最佳解决方案，但RL方法比传统求解器具有许多优势。首先，该框架的多功能性允许解决更复杂的组合问题。此外，RL算法没有尝试解决问题的特定实例，而是学习解决问题所需的技能。然后，经过训练的策略可以立即为看不见的问题提供解决方案，而无需从头开始解决问题。最后，使用训练有素的模型使RL求解器的使用最快是最快的，因此使该方法更适合在用户体验至关重要的情况下进行商业用途。知识转移等技术也可以用来提高算法的训练效率，并有助于解决更大，更复杂的问题。

In this paper, we evaluate the use of Reinforcement Learning (RL) to solve a classic combinatorial optimization problem: the Capacitated Vehicle Routing Problem (CVRP). We formalize this problem in the RL framework and compare two of the most promising RL approaches with traditional solving techniques on a set of benchmark instances. We measure the different approaches with the quality of the solution returned and the time required to return it. We found that despite not returning the best solution, the RL approach has many advantages over traditional solvers. First, the versatility of the framework allows the resolution of more complex combinatorial problems. Moreover, instead of trying to solve a specific instance of the problem, the RL algorithm learns the skills required to solve the problem. The trained policy can then quasi instantly provide a solution to an unseen problem without having to solve it from scratch. Finally, the use of trained models makes the RL solver by far the fastest, and therefore make this approach more suited for commercial use where the user experience is paramount. Techniques like Knowledge Transfer can also be used to improve the training efficiency of the algorithm and help solve bigger and more complex problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题