论文标题
学习订购销售损失和不确定用品的库存系统
Learning to Order for Inventory Systems with Lost Sales and Uncertain Supplies
论文作者
论文摘要
我们考虑了一个随机损失销售库存控制系统,该系统在计划中$ t $上有交货时间$ l $。供应不确定,并且是订单数量(由于随机产量/容量等)的函数。我们的目标是最大程度地减少$ t $ - 周期成本,即使在已知的需求和供应分布下,也已知这一问题在计算上也很棘手。在本文中,我们假设需求和供应分布均未知并开发出一种计算有效的在线学习算法。我们表明,我们的算法在$ O(l+\ sqrt {t})$($ l \ geq \ log log(t)$时,我们的算法成本与超过$ t $时期的最佳策略成本之间的性能差距(即在$ t $周期超过$ t $时期的成本之间的性能差距)。我们这样做1)表明我们的算法成本最多$ O(L+\ sqrt {t})$对于任何$ L \ geq 0 $,与完整信息(众所周知且广泛使用的算法)和2)相比,与现有文献的最佳恒定级策略相比,利用其已知的绩效保证。据我们所知,有限的样本$ O(\ sqrt {t})$($ l $中的多项式)遗憾的是,在在线库存控制文献中以前不知道针对最佳策略的基准标准。这个学习问题的一个关键挑战是,可以审查需求和供应数据。因此,只能观察到截短的值。我们通过证明订单数量$ q^2 $生成的数据使我们能够模拟这一挑战,这使我们能够模拟所有$ q^2 $的性能,还可以为所有$ q^1 <q^2 $ $ q^1 $,即使在数据检查下,也可以获得足够的信息。通过建立高概率耦合参数,我们能够在有限的时间范围内评估和比较其稳定状态下不同顺序策略的性能。由于该问题缺乏凸度,因此我们开发了一种活跃的消除方法,可以适应地排除次优的解决方案。
We consider a stochastic lost-sales inventory control system with a lead time $L$ over a planning horizon $T$. Supply is uncertain, and is a function of the order quantity (due to random yield/capacity, etc). We aim to minimize the $T$-period cost, a problem that is known to be computationally intractable even under known distributions of demand and supply. In this paper, we assume that both the demand and supply distributions are unknown and develop a computationally efficient online learning algorithm. We show that our algorithm achieves a regret (i.e. the performance gap between the cost of our algorithm and that of an optimal policy over $T$ periods) of $O(L+\sqrt{T})$ when $L\geq\log(T)$. We do so by 1) showing our algorithm cost is higher by at most $O(L+\sqrt{T})$ for any $L\geq 0$ compared to an optimal constant-order policy under complete information (a well-known and widely-used algorithm) and 2) leveraging its known performance guarantee from the existing literature. To the best of our knowledge, a finite-sample $O(\sqrt{T})$ (and polynomial in $L$) regret bound when benchmarked against an optimal policy is not known before in the online inventory control literature. A key challenge in this learning problem is that both demand and supply data can be censored; hence only truncated values are observable. We circumvent this challenge by showing that the data generated under an order quantity $q^2$ allows us to simulate the performance of not only $q^2$ but also $q^1$ for all $q^1<q^2$, a key observation to obtain sufficient information even under data censoring. By establishing a high probability coupling argument, we are able to evaluate and compare the performance of different order policies at their steady state within a finite time horizon. Since the problem lacks convexity, we develop an active elimination method that adaptively rules out suboptimal solutions.