线性动力学系统的多任务模仿学习

论文标题

线性动力学系统的多任务模仿学习

Multi-Task Imitation Learning for Linear Dynamical Systems

论文作者

Zhang, Thomas T., Kang, Katie, Lee, Bruce D., Tomlin, Claire, Levine, Sergey, Tu, Stephen, Matni, Nikolai

论文摘要

我们研究表示有效模仿学习的表示形式学习。特别是，我们考虑一个将学习分为两个阶段的设置：（a）从$ h $ source策略中学习共享的$ k $维表示，以及（b）目标策略微调步骤，其中使用学习的代表来参数化策略类别。我们发现，学到的目标策略产生的轨迹的模仿差距由$ \ tilde {o} \ left（\ frac {\ frac {k n_x} {hn _ {\ mathrm {shared}}}}}}}}}}}}}}} + \ frac { k $是状态维度，$ n_u $是输入维度，$ n _ {\ mathrm {shared}} $表示表示在表示过程中为每个策略收集的数据总数，而$ n _ {\ mathrm {target}} $是目标任务数据的量。该结果正式的直觉是，跨相关任务汇总数据以学习表示形式可以显着提高学习目标任务的样本效率。该结合所建议的趋势在模拟中得到了证实。

We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$, where $n_x > k$ is the state dimension, $n_u$ is the input dimension, $N_{\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题