论文标题
具有内在动机的目标条件增强学习的自动剂:一项简短的调查
Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
论文作者
论文摘要
构建可以探索开放式环境,发现可能的相互作用并建立技能曲目的自主机器是人工智能的一般目标。发展方法认为,这只能由$ autotelic $ $ $ $ $ $ $:本质上可以学习代表,生成,选择和解决自己的问题的学习者。近年来,发展方法与深度强化学习(RL)方法的融合已经导致了一个新领域的出现:$ nevelopmental $ $ $增强$ $ $LEAL色$。 Developmental RL关注的是使用深度RL算法来解决一个发展问题 - $本质上$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ - $ - $ end $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $。目标的自我生成需要学习紧凑的目标编码及其相关目标实现功能。与最初旨在使用外部奖励信号来解决预定义的目标集的标准RL算法相比,这引发了新的挑战。本文介绍了开发RL,并提出了一个基于目标条件的RL的计算框架,以解决内在动机的技能获取问题。它继续介绍文献中使用的各种目标表示形式的类型,然后再审查现有方法以学习代表自主系统中的目标和优先级。最终,我们通过讨论了寻求本质上动机的技能获取的一些公开挑战来结束论文。
Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by $autotelic$ $agents$: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent years, the convergence of developmental approaches with deep reinforcement learning (RL) methods has been leading to the emergence of a new field: $developmental$ $reinforcement$ $learning$. Developmental RL is concerned with the use of deep RL algorithms to tackle a developmental problem -- the $intrinsically$ $motivated$ $acquisition$ $of$ $open$-$ended$ $repertoires$ $of$ $skills$. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions. This raises new challenges compared to standard RL algorithms originally designed to tackle pre-defined sets of goals using external reward signals. The present paper introduces developmental RL and proposes a computational framework based on goal-conditioned RL to tackle the intrinsically motivated skills acquisition problem. It proceeds to present a typology of the various goal representations used in the literature, before reviewing existing methods to learn to represent and prioritize goals in autonomous systems. We finally close the paper by discussing some open challenges in the quest of intrinsically motivated skills acquisition.