一种基于拍卖的协调策略，用于构成任务限制的多代理随机计划，并具有少量奖励

论文标题

一种基于拍卖的协调策略，用于构成任务限制的多代理随机计划，并具有少量奖励

An Auction-based Coordination Strategy for Task-Constrained Multi-Agent Stochastic Planning with Submodular Rewards

论文作者

Liu, Ruifan, Shin, Hyo-Sang, Yan, Binbin, Tsourdos, Antonios

论文摘要

在许多领域，例如运输和物流，搜索和救援或合作监视，在考虑可能的执行不确定性的情况下，将待分配任务。现有的任务协调算法要么忽略随机过程，要么遭受计算强度的影响。利用问题的弱耦合特征和提前协调的机会，我们提出了一种使用新配制的分数函数分散拍卖的协调策略，该策略是通过将问题形成到任务约束的马尔可夫决策过程（MDPS）中而生成的。所提出的方法可以保证在下奖励功能的前提下融合和至少50％的最优性。此外，对于大规模应用的实施，还建议使用该方法的大概变体，即深度拍卖，使用神经网络，这是对构建MDP的麻烦。受众所周知的参与者批评结构的启发，两种变压器分别用于将观测值分别映射到动作概率和累积奖励上。最后，我们在无人机交付的背景下证明了两种拟议方法的性能，在该背景下，无人机联盟的随机计划被带入随机价格收集的车辆路由问题（VRP）。将仿真结果与最新方法相提并论，以解决方案质量，计划效率和可扩展性。

In many domains such as transportation and logistics, search and rescue, or cooperative surveillance, tasks are pending to be allocated with the consideration of possible execution uncertainties. Existing task coordination algorithms either ignore the stochastic process or suffer from the computational intensity. Taking advantage of the weakly coupled feature of the problem and the opportunity for coordination in advance, we propose a decentralized auction-based coordination strategy using a newly formulated score function which is generated by forming the problem into task-constrained Markov decision processes (MDPs). The proposed method guarantees convergence and at least 50% optimality in the premise of a submodular reward function. Furthermore, for the implementation on large-scale applications, an approximate variant of the proposed method, namely Deep Auction, is also suggested with the use of neural networks, which is evasive of the troublesome for constructing MDPs. Inspired by the well-known actor-critic architecture, two Transformers are used to map observations to action probabilities and cumulative rewards respectively. Finally, we demonstrate the performance of the two proposed approaches in the context of drone deliveries, where the stochastic planning for the drone league is cast into a stochastic price-collecting Vehicle Routing Problem (VRP) with time windows. Simulation results are compared with state-of-the-art methods in terms of solution quality, planning efficiency and scalability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题