论文标题
定期进行增强学习的知识蒸馏
Periodic Intra-Ensemble Knowledge Distillation for Reinforcement Learning
论文作者
论文摘要
非政策合奏增强学习(RL)方法在一系列RL基准任务中表现出了令人印象深刻的结果。最近的工作表明,在培训过程中或期间,以有监督的方式直接模仿专家的政策,可以更快地改善RL代理商的政策。在这些最近的见解中,我们提出了周期性的内存知识蒸馏(PIEKD)。 PIEKD是一个学习框架,它利用政策合奏在环境中采取行动,同时通过知识蒸馏在整体中定期共享知识。我们的实验表明,PIEKD在几项具有挑战性的Mujoco基准任务上的样本效率方面的最新RL方法改进。此外,我们进行消融研究以更好地了解PIEKD。
Off-policy ensemble reinforcement learning (RL) methods have demonstrated impressive results across a range of RL benchmark tasks. Recent works suggest that directly imitating experts' policies in a supervised manner before or during the course of training enables faster policy improvement for an RL agent. Motivated by these recent insights, we propose Periodic Intra-Ensemble Knowledge Distillation (PIEKD). PIEKD is a learning framework that uses an ensemble of policies to act in the environment while periodically sharing knowledge amongst policies in the ensemble through knowledge distillation. Our experiments demonstrate that PIEKD improves upon a state-of-the-art RL method in sample efficiency on several challenging MuJoCo benchmark tasks. Additionally, we perform ablation studies to better understand PIEKD.