与机器人和人类同事的仓库物流的可扩展多代理增强学习

论文标题

与机器人和人类同事的仓库物流的可扩展多代理增强学习

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

论文作者

Krnjaic, Aleksandar, Steleac, Raul D., Thomas, Jonathan D., Papoudakis, Georgios, Schäfer, Lukas, To, Andrew Wing Keung, Lao, Kuan-Ho, Cubuktepe, Murat, Haley, Matthew, Börsting, Peter, Albrecht, Stefano V.

论文摘要

我们考虑了一个仓库，其中数十个移动机器人和人类采摘者共同努力收集和运送仓库中的物品。我们解决的基本问题称为订购问题，是这些工人代理必须如何协调其在仓库中的动作和行动，以最大程度地提高此任务的性能。使用启发式方法建立的行业方法需要大量的工程努力，以优化天生可变的仓库配置。相反，可以灵活地应用多代理增强学习（MARL），以适用于不同的仓库配置（例如大小，布局，工人的数量/类型，工人的数量/类型，项目补充频率）以及不同类型的订购订单式范式（例如，商品对方和人对方的东西），就像经纪人可以通过经验合作的方式，以了解如何通过经验来学习。我们开发了层次结构的MARL算法，在该算法中，经理代理商将目标分配给工人，并且经理和工人的政策是为了最大化全球目标（例如，选择利率）的共同培训。我们的层次结构算法在基线MARL算法上取得了显着提高，并且在多种仓库配置和不同的订单挑选范式中，多个已建立的行业启发式方法的总体选择率和整体选择率。

We consider a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance in this task. Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), and different types of order-picking paradigms (e.g. Goods-to-Person and Person-to-Goods), as the agents can learn how to cooperate optimally through experience. We develop hierarchical MARL algorithms in which a manager agent assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency over baseline MARL algorithms and overall pick rates over multiple established industry heuristics in a diverse set of warehouse configurations and different order-picking paradigms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题