使用深厚的增强学习解决订单批处理和测序问题

论文标题

使用深厚的增强学习解决订单批处理和测序问题

Solving the Order Batching and Sequencing Problem using Deep Reinforcement Learning

论文作者

Cals, Bram, Zhang, Yingqian, Dijkman, Remco, van Dorst, Claudy

论文摘要

在电子商务市场中，按时交付对客户满意度非常重要。在本文中，我们提出了一种深入的加固学习（DRL）方法，用于决定如何以及何时在仓库中批处理订单，以最大程度地减少迟到的订单数量。特别是，该技术有助于做出决定是应单独选择订单（逐订单）或与其他订单（逐批批次）一起挑选的订单，如果是其他订单。我们通过将问题提出为半马尔可夫决策过程来解决该问题，并开发基于向量的状态表示，其中包括仓库系统的特征。这使我们能够创建一个深入的增强学习解决方案，该解决方案通过与环境互动并通过近端策略优化算法解决问题来学习策略。我们通过将其与不同问题设置中的多个批处理和测序启发式方法进行比较来评估所提出的DRL方法的性能。结果表明，DRL方法能够制定一种策略，该策略产生一致，良好的解决方案并比提出的启发式方法更好。

In e-commerce markets, on time delivery is of great importance to customer satisfaction. In this paper, we present a Deep Reinforcement Learning (DRL) approach for deciding how and when orders should be batched and picked in a warehouse to minimize the number of tardy orders. In particular, the technique facilitates making decisions on whether an order should be picked individually (pick-by-order) or picked in a batch with other orders (pick-by-batch), and if so with which other orders. We approach the problem by formulating it as a semi-Markov decision process and develop a vector-based state representation that includes the characteristics of the warehouse system. This allows us to create a deep reinforcement learning solution that learns a strategy by interacting with the environment and solve the problem with a proximal policy optimization algorithm. We evaluate the performance of the proposed DRL approach by comparing it with several batching and sequencing heuristics in different problem settings. The results show that the DRL approach is able to develop a strategy that produces consistent, good solutions and performs better than the proposed heuristics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题