可扩展的深入增强学习

论文标题

可扩展的深入增强学习

Scalable Deep Reinforcement Learning for Ride-Hailing

论文作者

Feng, Jiekun, Gluzman, Mark, Dai, J. G.

论文摘要

乘车服务，例如Didi Chuxing，Lyft和Uber，安排数千辆汽车全天满足乘车要求。我们考虑了乘车服务系统的马尔可夫决策过程（MDP）模型，将其作为加固学习（RL）问题构建。许多代理（CAR）的同时控制对MDP优化提出了挑战，因为动作空间随着汽车数量而成倍增长。我们通过将任务顺序分配给驱动程序，为MDP操作提出了特殊的分解。新操作结构解决了可伸缩性问题，并可以使用深层RL算法进行控制策略优化。我们通过基于DIDI CHUXING的实际数据来证明我们提出的分解的好处。

Ride-hailing services, such as Didi Chuxing, Lyft, and Uber, arrange thousands of cars to meet ride requests throughout the day. We consider a Markov decision process (MDP) model of a ride-hailing service system, framing it as a reinforcement learning (RL) problem. The simultaneous control of many agents (cars) presents a challenge for the MDP optimization because the action space grows exponentially with the number of cars. We propose a special decomposition for the MDP actions by sequentially assigning tasks to the drivers. The new actions structure resolves the scalability problem and enables the use of deep RL algorithms for control policy optimization. We demonstrate the benefit of our proposed decomposition with a numerical experiment based on real data from Didi Chuxing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题