论文标题
海马风格的认知建筑(HICA)用于操作调节
Hippocampus-Inspired Cognitive Architecture (HICA) for Operant Conditioning
论文作者
论文摘要
尚不清楚操作条件的神经实施尚不清楚。我们提出了一种受海马风格的认知结构(HICA)作为操作调节的神经机制。 HICA解释了一种学习机制,其中代理可以在一些试验中学习新的行为政策,就像哺乳动物在操作调节实验中所做的那样。 HICA由两种不同类型的模块组成。一种是一种通用学习模块类型,代表新皮层灰质中的皮质柱。工作原理被建模为调制异构预测存储器(MHPM)。在MHPM中,每个模块都学会了预测下层的输入向量的顺序以及来自较高层的上下文向量的序列。作为上下文信号(自上而下的反馈信号)将预测送入下层,并作为输入信号(自下而上的前馈信号传导)送入较高层。奖励调节这些模块中的学习率,以有效地记住有意义的序列。在MHPM中,与传统的端到端学习相比,每个模块都以局部和分布式的方式进行更新,并反向传播单个目标损失。该本地结构可以实现异质的模块网络。第二种类型是天生的特殊用途模块,代表大脑下皮质系统的各种器官。模块建模器官(例如杏仁核,海马和奖励中心)已预先编程以实现本能行为。海马扮演模拟器的角色。它是最高级别信号的自回旋预测模型,具有内存的循环结构,而皮质柱则是较低的层,可为模拟提供详细的信息。模拟成为学习的基础,几乎没有试验和操作调节所需的故意计划。
The neural implementation of operant conditioning with few trials is unclear. We propose a Hippocampus-Inspired Cognitive Architecture (HICA) as a neural mechanism for operant conditioning. HICA explains a learning mechanism in which agents can learn a new behavior policy in a few trials, as mammals do in operant conditioning experiments. HICA is composed of two different types of modules. One is a universal learning module type that represents a cortical column in the neocortex gray matter. The working principle is modeled as Modulated Heterarchical Prediction Memory (mHPM). In mHPM, each module learns to predict a succeeding input vector given the sequence of the input vectors from lower layers and the context vectors from higher layers. The prediction is fed into the lower layers as a context signal (top-down feedback signaling), and into the higher layers as an input signal (bottom-up feedforward signaling). Rewards modulate the learning rate in those modules to memorize meaningful sequences effectively. In mHPM, each module updates in a local and distributed way compared to conventional end-to-end learning with backpropagation of the single objective loss. This local structure enables the heterarchical network of modules. The second type is an innate, special-purpose module representing various organs of the brain's subcortical system. Modules modeling organs such as the amygdala, hippocampus, and reward center are pre-programmed to enable instinctive behaviors. The hippocampus plays the role of the simulator. It is an autoregressive prediction model of the top-most level signal with a loop structure of memory, while cortical columns are lower layers that provide detailed information to the simulation. The simulation becomes the basis for learning with few trials and the deliberate planning required for operant conditioning.