论文标题
依次的因果模仿学习与未观察的混杂因素
Sequential Causal Imitation Learning with Unobserved Confounders
论文作者
论文摘要
“猴子见猴子”是一句古老的格言,指的是幼稚的模仿,没有对系统的基本机制的深刻理解。确实,如果演示者可以访问模仿者(猴子)不可用的信息,例如不同的传感器集,那么无论模仿者如何完美地模拟其感知的环境(请参阅),试图重现示威者的行为(DO)都会导致不良的结果。在因果模仿学习的标题下,在文献中研究了示威者和模仿者之间不匹配的模仿学习(Zhang等,2020),但现有的解决方案仅限于单阶段的决策。本文调查了在顺序设置中的因果模仿学习的问题,在该设置中,模仿者必须每集做出多个决策。我们开发了一个图形标准,这对于确定因果模仿的可行性是必要和足够的,在模仿者可以符合演示者的性能时提供条件,尽管能力不同。最后,我们提供了一种有效的算法来确定仿真性并通过模拟证实了我们的理论。
"Monkey see monkey do" is an age-old adage, referring to naïve imitation without a deep understanding of a system's underlying mechanics. Indeed, if a demonstrator has access to information unavailable to the imitator (monkey), such as a different set of sensors, then no matter how perfectly the imitator models its perceived environment (See), attempting to reproduce the demonstrator's behavior (Do) can lead to poor outcomes. Imitation learning in the presence of a mismatch between demonstrator and imitator has been studied in the literature under the rubric of causal imitation learning (Zhang et al., 2020), but existing solutions are limited to single-stage decision-making. This paper investigates the problem of causal imitation learning in sequential settings, where the imitator must make multiple decisions per episode. We develop a graphical criterion that is necessary and sufficient for determining the feasibility of causal imitation, providing conditions when an imitator can match a demonstrator's performance despite differing capabilities. Finally, we provide an efficient algorithm for determining imitability and corroborate our theory with simulations.