论文标题

库存和有限开关限制下的MNL伴侣

MNL-Bandits under Inventory and Limited Switches Constraints

论文作者

Zhang, Hongbin, Yang, Yu, Wu, Feng, Zhang, Qixin

论文摘要

优化各种产品以向客户展示是增加离线和在线零售商收入的关键。在探索客户的偏好和利用从数据中学到的客户选择之间的折衷,通过采用多名夜logit(MNL)选择模型来捕获客户对产品的选择,我们研究了在计划中$ t $优化分类的问题,以最大程度地利用零售商的利润。为了使问题设置更加实用,我们同时考虑库存约束和有限的开关约束,零售商在时间$ t $之前无法用尽资源库存,并且被禁止将显示的分类切换给客户太多次。当在线零售商想要动态优化客户群体的各种选择时,这种设置适合这种情况。我们开发了一种有效的UCB状算法,以优化分类,同时从数据中学习客户的选择。我们证明,如果允许$ o(t^α)$开关,我们的算法可以实现子线性后悔绑定的$ \ tilde {o} \ left(t^{1-α/2} \ right)$。 %,我们的遗憾对$ t $是最佳的。广泛的数值实验表明,我们的算法优于基准和算法的性能与理论上界之间的差距很小。

Optimizing the assortment of products to display to customers is a key to increasing revenue for both offline and online retailers. To trade-off between exploring customers' preference and exploiting customers' choices learned from data, in this paper, by adopting the Multi-Nomial Logit (MNL) choice model to capture customers' choices over products, we study the problem of optimizing assortments over a planning horizon $T$ for maximizing the profit of the retailer. To make the problem setting more practical, we consider both the inventory constraint and the limited switches constraint, where the retailer cannot use up the resource inventory before time $T$ and is forbidden to switch the assortment shown to customers too many times. Such a setting suits the case when an online retailer wants to dynamically optimize the assortment selection for a population of customers. We develop an efficient UCB-like algorithm to optimize the assortments while learning customers' choices from data. We prove that our algorithm can achieve a sub-linear regret bound $\tilde{O}\left(T^{1-α/2}\right)$ if $O(T^α)$ switches are allowed. %, and our regret bound is optimal with respect to $T$. Extensive numerical experiments show that our algorithm outperforms baselines and the gap between our algorithm's performance and the theoretical upper bound is small.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源