论文标题
通过使用各种复发性神经网络主动推断,针对习惯剂的目标定向计划
Goal-Directed Planning for Habituated Agents by Active Inference Using a Variational Recurrent Neural Network
论文作者
论文摘要
至关重要的是要询问代理如何通过仅使用习惯性感觉运动体验获得的世界模型来制定行动计划来实现目标。尽管许多现有的机器人研究使用了前瞻性模型框架,但存在具有高度自由度的概括问题。当前的研究表明,采用生成模型的预测编码(PC)和主动推理(AIF)框架可以通过在低维的潜在状态空间中学习代表从习惯习惯的感觉运动轨迹中提取的概率结构的先验分布来发展更好的概括。在我们提出的模型中,通过推断最佳潜在变量以及突触权重来进行学习,以最大程度地提高证据下限,而目标指导的计划是通过推断潜在变量来最大化估计的下限来实现的。我们提出的模型通过模拟中的简单和复杂的机器人任务进行了评估,这些模拟通过为正则化系数设置中间值,在有限的训练数据中表明了在学习有限的学习中进行了足够的概括。此外,比较模拟的结果表明,由于在习惯轨迹的范围内限制了对运动计划的搜索,因此所提出的模型在目标定向计划中的传统前向模型优于传统的前向模型。
It is crucial to ask how agents can achieve goals by generating action plans using only partial models of the world acquired through habituated sensory-motor experiences. Although many existing robotics studies use a forward model framework, there are generalization issues with high degrees of freedom. The current study shows that the predictive coding (PC) and active inference (AIF) frameworks, which employ a generative model, can develop better generalization by learning a prior distribution in a low dimensional latent state space representing probabilistic structures extracted from well habituated sensory-motor trajectories. In our proposed model, learning is carried out by inferring optimal latent variables as well as synaptic weights for maximizing the evidence lower bound, while goal-directed planning is accomplished by inferring latent variables for maximizing the estimated lower bound. Our proposed model was evaluated with both simple and complex robotic tasks in simulation, which demonstrated sufficient generalization in learning with limited training data by setting an intermediate value for a regularization coefficient. Furthermore, comparative simulation results show that the proposed model outperforms a conventional forward model in goal-directed planning, due to the learned prior confining the search of motor plans within the range of habituated trajectories.