论文标题
在线正规化,始终为valid高维动态定价
Online Regularization towards Always-Valid High-Dimensional Dynamic Pricing
论文作者
论文摘要
通过始终有效的在线统计学习程序制定动态定价政策是一个重要且尚未解决的问题。大多数现有的动态定价政策侧重于采用客户选择模型的忠诚,在定价过程中适应学习统计模型的在线不确定性的能力有限。在本文中,我们提出了一种新颖的方法,用于设计基于理论保证的基于动态定价政策的正规化在线统计学习。新方法克服了连续监控在线套索程序的挑战,并拥有多个吸引人的特性。特别是,我们做出了决定性的观察,即定价决策的始终有效性在线正规化计划中建立和壮成长。我们提出的在线正规化计划使提出的乐观在线正规化最大似然定价(OORMLP)定价策略具有三个主要优势:将市场噪声知识编码为定价过程乐观;在所有决策点上赋予在线统计学习的能力;包络预测过程,具有时间均匀的非反应性甲骨文不平等现象。这种非反应推理结果使我们能够在实践中设计更有效和健壮的动态定价算法。从理论上讲,提出的OORMLP算法利用了高维模型的稀疏结构,并在决策范围内确保了对数遗憾。通过提出一种乐观的在线套索程序,可以基于新的非反应martingale浓度来解决动态定价问题,从而使这些理论上的进步成为可能。在实验中,我们在不同的合成和实际定价问题设置中评估了OORMLP,并证明OORMLP可以进步最新的方法。
Devising dynamic pricing policy with always valid online statistical learning procedure is an important and as yet unresolved problem. Most existing dynamic pricing policy, which focus on the faithfulness of adopted customer choice models, exhibit a limited capability for adapting the online uncertainty of learned statistical model during pricing process. In this paper, we propose a novel approach for designing dynamic pricing policy based regularized online statistical learning with theoretical guarantees. The new approach overcomes the challenge of continuous monitoring of online Lasso procedure and possesses several appealing properties. In particular, we make the decisive observation that the always-validity of pricing decisions builds and thrives on the online regularization scheme. Our proposed online regularization scheme equips the proposed optimistic online regularized maximum likelihood pricing (OORMLP) pricing policy with three major advantages: encode market noise knowledge into pricing process optimism; empower online statistical learning with always-validity over all decision points; envelop prediction error process with time-uniform non-asymptotic oracle inequalities. This type of non-asymptotic inference results allows us to design more sample-efficient and robust dynamic pricing algorithms in practice. In theory, the proposed OORMLP algorithm exploits the sparsity structure of high-dimensional models and secures a logarithmic regret in a decision horizon. These theoretical advances are made possible by proposing an optimistic online Lasso procedure that resolves dynamic pricing problems at the process level, based on a novel use of non-asymptotic martingale concentration. In experiments, we evaluate OORMLP in different synthetic and real pricing problem settings, and demonstrate that OORMLP advances the state-of-the-art methods.