论文标题
学习主观的情节记忆
Episodic Memory for Learning Subjective-Timescale Models
论文作者
论文摘要
在基于模型的学习中,代理商的模型通常是在环境连续状态之间的过渡中通常定义的,即使计划通常需要在多步骤时间尺度上进行推理,而中间状态不必要,或者更糟的是积累了预测错误。相比之下,生物生物体中的智能行为的特征在于能够根据上下文规划不同的时间尺度。受到人类时间感知的最新作品的启发,我们根据定义了代理人主观的时间表的情节记忆的序列,设计了一种学习过渡动态模型的新方法,以了解世界动态和执行未来的计划。我们在主动推理的框架中实现了这一点,并证明由此产生的主观时间表模型(STM)可以系统地改变其预测的时间范围,同时保持相同的计算效率。此外,我们表明,STM预测更有可能引入未来的显着事件(例如,新的对象)激励探索环境的新领域。结果,STM会产生更有用的动作条件推出,以帮助代理商做出更好的决策。我们可以通过环境的客观时间尺度动力学训练我们的STM代理在动物AI环境中的STM代理在动物AI环境中的表现的显着改善。
In model-based learning, an agent's model is commonly defined over transitions between consecutive states of an environment even though planning often requires reasoning over multi-step timescales, with intermediate states either unnecessary, or worse, accumulating prediction error. In contrast, intelligent behaviour in biological organisms is characterised by the ability to plan over varying temporal scales depending on the context. Inspired by the recent works on human time perception, we devise a novel approach to learning a transition dynamics model, based on the sequences of episodic memories that define the agent's subjective timescale - over which it learns world dynamics and over which future planning is performed. We implement this in the framework of active inference and demonstrate that the resulting subjective-timescale model (STM) can systematically vary the temporal extent of its predictions while preserving the same computational efficiency. Additionally, we show that STM predictions are more likely to introduce future salient events (for example new objects coming into view), incentivising exploration of new areas of the environment. As a result, STM produces more informative action-conditioned roll-outs that assist the agent in making better decisions. We validate significant improvement in our STM agent's performance in the Animal-AI environment against a baseline system, trained using the environment's objective-timescale dynamics.