论文标题
学习为不明需求的车辆服务定价
Learning to Price Vehicle Service with Unknown Demand
论文作者
论文摘要
车辆服务提供商可以根据用户对不同来源用途对的旅行需求设定服务价格是有利可图的。先前关于车辆服务空间定价的研究取决于提供者知道用户需求的假设。在本文中,我们研究了一个垄断提供者,他们最初不知道用户的需求,并且需要通过观察用户对服务价格的响应来随着时间的推移学习。我们设计了定价和车辆供应政策,考虑到勘探(即学习需求)和剥削(即最大化提供商的短期收益)之间的权衡。考虑到提供商需要确保每个位置的车辆流量平衡,因此其定价和供应决定对不同来源用途对的决定是紧密耦合的。这使得理论上分析我们政策的绩效使其具有挑战性。我们分析了根据我们的政策中提供商预期的时间平均收益与千里眼政策之间的差距,该政策根据需求的完整信息做出决定。我们证明,在运行D天的政策后,预期的时间平均收益的损失最多可以是O((Ln d)^0.5 D^( - 0.25)),随着D接近无限,它会衰减为零。
It can be profitable for vehicle service providers to set service prices based on users' travel demand on different origin-destination pairs. The prior studies on the spatial pricing of vehicle service rely on the assumption that providers know users' demand. In this paper, we study a monopolistic provider who initially does not know users' demand and needs to learn it over time by observing the users' responses to the service prices. We design a pricing and vehicle supply policy, considering the tradeoff between exploration (i.e., learning the demand) and exploitation (i.e., maximizing the provider's short-term payoff). Considering that the provider needs to ensure the vehicle flow balance at each location, its pricing and supply decisions for different origin-destination pairs are tightly coupled. This makes it challenging to theoretically analyze the performance of our policy. We analyze the gap between the provider's expected time-average payoffs under our policy and a clairvoyant policy, which makes decisions based on complete information of the demand. We prove that after running our policy for D days, the loss in the expected time-average payoff can be at most O((ln D)^0.5 D^(-0.25)), which decays to zero as D approaches infinity.