多产品多节点库存管理供应链中的增强学习

论文标题

多产品多节点库存管理供应链中的增强学习

Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains

论文作者

Sultana, Nazneen N, Meisheri, Hardik, Baniwal, Vinita, Nath, Somjit, Ravindran, Balaraman, Khadilkar, Harshad

论文摘要

本文介绍了增强学习（RL）在供应链中多产品库存管理中的应用。问题描述和解决方案均取决于现实世界的业务解决方案。这个问题在供应链文献方面的新颖性是（i）我们考虑对具有共享容量的大量产品（50至1000）产品的同时库存管理，（ii）我们考虑一个由仓库组成的多节点供应链，该仓库由三家商店（III）（iii）仓库，商店，商店和商店的运输范围和运输量（extore and Storiatient and Opportions and Opportial and Opportial and Opportial and Opportion and Opportion and Opport）（iv）（IV）（IV）时间滞后，（v）商店中产品的需求是随机的。我们描述了一个新颖的表述（分层）增强学习框架，该框架可用于并行决策，并使用Advantage Actor评论家（A2C）算法和量化的动作空间来解决问题。实验表明，所提出的方法能够处理由最大化产品销售和最大程度地减少易腐烂产品的浪费的多目标奖励。

This paper describes the application of reinforcement learning (RL) to multi-product inventory management in supply chains. The problem description and solution are both adapted from a real-world business solution. The novelty of this problem with respect to supply chain literature is (i) we consider concurrent inventory management of a large number (50 to 1000) of products with shared capacity, (ii) we consider a multi-node supply chain consisting of a warehouse which supplies three stores, (iii) the warehouse, stores, and transportation from warehouse to stores have finite capacities, (iv) warehouse and store replenishment happen at different time scales and with realistic time lags, and (v) demand for products at the stores is stochastic. We describe a novel formulation in a multi-agent (hierarchical) reinforcement learning framework that can be used for parallelised decision-making, and use the advantage actor critic (A2C) algorithm with quantised action spaces to solve the problem. Experiments show that the proposed approach is able to handle a multi-objective reward comprised of maximising product sales and minimising wastage of perishable products.

下载PDF全文

下载文献需遵守相关版权规定

论文标题