通过自我修改网络进行元强化学习

论文标题

通过自我修改网络进行元强化学习

Meta-Reinforcement Learning with Self-Modifying Networks

论文作者

Chalvidal, Mathieu, Serre, Thomas, VanRullen, Rufin

论文摘要

深度强化学习已经证明了通过梯度下降调整的神经网络的潜力，以解决良好的环境中的复杂任务。但是，这些神经系统是缓慢的学习者，生产专门的药物，没有任何机制可以继续学习培训课程。相反，生物突触可塑性是持久和多种多样的，并被认为在诸如工作记忆和认知灵活性之类的执行功能中起着关键作用，有可能支持更有效和更通用的学习能力。受此启发的启发，我们建议建立具有动态权重的网络，能够不断执行自反射修改，这是其当前突触状态和动作奖励反馈的函数，而不是固定的网络配置。最终的模型，Metods（用于元优化的动力突触）是一种广泛适用的元强制学习系统，能够在代理策略空间中学习有效而强大的控制规则。具有动态突触的单层可以执行单次学习，将导航原理概括为看不见的环境并表现出强大的学习自适应运动策略的能力。

Deep Reinforcement Learning has demonstrated the potential of neural networks tuned with gradient descent for solving complex tasks in well-delimited environments. However, these neural systems are slow learners producing specialized agents with no mechanism to continue learning beyond their training curriculum. On the contrary, biological synaptic plasticity is persistent and manifold, and has been hypothesized to play a key role in executive functions such as working memory and cognitive flexibility, potentially supporting more efficient and generic learning abilities. Inspired by this, we propose to build networks with dynamic weights, able to continually perform self-reflexive modification as a function of their current synaptic state and action-reward feedback, rather than a fixed network configuration. The resulting model, MetODS (for Meta-Optimized Dynamical Synapses) is a broadly applicable meta-reinforcement learning system able to learn efficient and powerful control rules in the agent policy space. A single layer with dynamic synapses can perform one-shot learning, generalizes navigation principles to unseen environments and manifests a strong ability to learn adaptive motor policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题