论文标题
从次优示威中对团队政策的半监督模仿学习
Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations
论文作者
论文摘要
我们提出了贝叶斯团队模仿学习者(BTIL),这是一种模仿学习算法,以模拟在马尔可夫域中执行顺序任务的团队的行为。与现有的多代理模仿学习技术相反,BTIL明确模型并渗透了团队成员的随时间变化的心理状态,从而从次优的团队合作的演示中实现了分散的团队政策的学习。此外,为了允许从小型数据集中进行样本和标签有效的政策学习,Btil采用了贝叶斯的观点,并且能够从半监督的示范中学习。我们证明并基准了BTIL在合成多代理任务以及人类代理团队工作的新型数据集上的性能。我们的实验表明,尽管团队成员(随时间变化且可能未对准)精神状态对其行为的影响,BTIL可以成功地从示威中学习团队政策。
We present Bayesian Team Imitation Learner (BTIL), an imitation learning algorithm to model the behavior of teams performing sequential tasks in Markovian domains. In contrast to existing multi-agent imitation learning techniques, BTIL explicitly models and infers the time-varying mental states of team members, thereby enabling learning of decentralized team policies from demonstrations of suboptimal teamwork. Further, to allow for sample- and label-efficient policy learning from small datasets, BTIL employs a Bayesian perspective and is capable of learning from semi-supervised demonstrations. We demonstrate and benchmark the performance of BTIL on synthetic multi-agent tasks as well as a novel dataset of human-agent teamwork. Our experiments show that BTIL can successfully learn team policies from demonstrations despite the influence of team members' (time-varying and potentially misaligned) mental states on their behavior.