通过双层优化的模仿学习的可证明的代表性学习

论文标题

通过双层优化的模仿学习的可证明的代表性学习

Provable Representation Learning for Imitation Learning via Bi-level Optimization

论文作者

Arora, Sanjeev, Du, Simon S., Kakade, Sham, Luo, Yuping, Saunshi, Nikunj

论文摘要

现代学习系统中的一种常见策略是学习一种对许多任务有用的表示形式，也就是代表性学习。我们在马尔可夫决策过程（MDP）的模仿学习设置中研究了此策略，其中有多个专家的轨迹可用。我们将表示形式学习为双层优化问题，其中“外部”优化试图学习关节表示，而“内部”优化编码了模仿学习设置，并试图学习特定于任务的参数。我们将此框架实例化，以模仿行为克隆和独立观察的学习设置。从理论上讲，我们使用框架表明，表示学习可以为在两种情况下的模仿学习提供样本复杂性好处。我们还提供概念验证实验来验证我们的理论。

A common strategy in modern learning systems is to learn a representation that is useful for many tasks, a.k.a. representation learning. We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts' trajectories are available. We formulate representation learning as a bi-level optimization problem where the "outer" optimization tries to learn the joint representation and the "inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters. We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone. Theoretically, we show using our framework that representation learning can provide sample complexity benefits for imitation learning in both settings. We also provide proof-of-concept experiments to verify our theory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题