通过信息瓶颈的动态概括在深度强化学习中

论文标题

通过信息瓶颈的动态概括在深度强化学习中

Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning

论文作者

Lu, Xingyu, Lee, Kimin, Abbeel, Pieter, Tiomkin, Stas

论文摘要

尽管深度强化学习（RL）在解决顺序决策问题方面取得了重大进展，但RL代理通常过于适应训练环境，并难以适应新的，看不见的环境。这样可以防止RL在现实世界中的强大应用，在这种情况下，系统动态可能会与培训环境大不相同。在这项工作中，我们的主要贡献是提出一个信息理论正则化目标和基于退火的优化方法，以在RL剂中获得更好的概括能力。我们证明了我们方法在从迷宫导航到机器人任务的不同领域中的极端概括益处；我们第一次表明，代理可以推广到远离训练参数分布的10个标准偏差以上的测试参数。这项工作通过逐渐删除用于解决任务的信息来改善RL的概括的原则方法；它为从训练到极为不同的测试设置进行系统研究的系统研究打开了门，重点是信息理论与机器学习之间的既定联系。

Despite the significant progress of deep reinforcement learning (RL) in solving sequential decision making problems, RL agents often overfit to training environments and struggle to adapt to new, unseen environments. This prevents robust applications of RL in real world situations, where system dynamics may deviate wildly from the training settings. In this work, our primary contribution is to propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents. We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks; for the first time, we show that agents can generalize to test parameters more than 10 standard deviations away from the training parameter distribution. This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving; it opens doors for the systematic study of generalization from training to extremely different testing settings, focusing on the established connections between information theory and machine learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题