从次优示威中对团队政策的半监督模仿学习

论文标题

从次优示威中对团队政策的半监督模仿学习

Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations

论文作者

Seo, Sangwon, Unhelkar, Vaibhav V.

论文摘要

我们提出了贝叶斯团队模仿学习者（BTIL），这是一种模仿学习算法，以模拟在马尔可夫域中执行顺序任务的团队的行为。与现有的多代理模仿学习技术相反，BTIL明确模型并渗透了团队成员的随时间变化的心理状态，从而从次优的团队合作的演示中实现了分散的团队政策的学习。此外，为了允许从小型数据集中进行样本和标签有效的政策学习，Btil采用了贝叶斯的观点，并且能够从半监督的示范中学习。我们证明并基准了BTIL在合成多代理任务以及人类代理团队工作的新型数据集上的性能。我们的实验表明，尽管团队成员（随时间变化且可能未对准）精神状态对其行为的影响，BTIL可以成功地从示威中学习团队政策。

We present Bayesian Team Imitation Learner (BTIL), an imitation learning algorithm to model the behavior of teams performing sequential tasks in Markovian domains. In contrast to existing multi-agent imitation learning techniques, BTIL explicitly models and infers the time-varying mental states of team members, thereby enabling learning of decentralized team policies from demonstrations of suboptimal teamwork. Further, to allow for sample- and label-efficient policy learning from small datasets, BTIL employs a Bayesian perspective and is capable of learning from semi-supervised demonstrations. We demonstrate and benchmark the performance of BTIL on synthetic multi-agent tasks as well as a novel dataset of human-agent teamwork. Our experiments show that BTIL can successfully learn team policies from demonstrations despite the influence of team members' (time-varying and potentially misaligned) mental states on their behavior.

下载PDF全文

下载文献需遵守相关版权规定

论文标题