通过竞争自私的加强学习者学习

论文标题

通过竞争自私的加强学习者学习

Learning by Competition of Self-Interested Reinforcement Learning Agents

论文作者

Chung, Stephen

论文摘要

人工神经网络可以通过向实施增强学习规则的单位统一广播奖励信号来训练。尽管这为训练网络中的逆向传播提供了一种具有生物学上合理的替代方案，但与之相关的高差异使训练深网的不切实际。较高的差异来自效率低下的结构信用分配，因为单个奖励信号用于评估所有单位的集体行动。为了促进结构性信贷分配，我们建议将奖励信号替换为隐藏单元，并随着该单位即将卸下的重量的$ l^2 $规范的变化。因此，网络中的每个隐藏单元都试图最大限度地提高其外向重量的规范，而不是全局奖励，因此我们称这种学习方法权重最大化。我们证明，重量最大化大约遵循预期的奖励梯度。与反向传播相反，重量最大化可用于训练连续值和离散值单元。此外，重量最大化解决了与生物学合理性有关的逆转的几个主要问题。我们的实验表明，接受体重最大化的网络可以比增强速度更快地学习，并且比回向传播速度较慢。重量最大化说明了合作行为的一个例子，该示例会自动由竞争性游戏中的自私者人群产生，而无需任何中央协调。

An artificial neural network can be trained by uniformly broadcasting a reward signal to units that implement a REINFORCE learning rule. Though this presents a biologically plausible alternative to backpropagation in training a network, the high variance associated with it renders it impractical to train deep networks. The high variance arises from the inefficient structural credit assignment since a single reward signal is used to evaluate the collective action of all units. To facilitate structural credit assignment, we propose replacing the reward signal to hidden units with the change in the $L^2$ norm of the unit's outgoing weight. As such, each hidden unit in the network is trying to maximize the norm of its outgoing weight instead of the global reward, and thus we call this learning method Weight Maximization. We prove that Weight Maximization is approximately following the gradient of rewards in expectation. In contrast to backpropagation, Weight Maximization can be used to train both continuous-valued and discrete-valued units. Moreover, Weight Maximization solves several major issues of backpropagation relating to biological plausibility. Our experiments show that a network trained with Weight Maximization can learn significantly faster than REINFORCE and slightly slower than backpropagation. Weight Maximization illustrates an example of cooperative behavior automatically arising from a population of self-interested agents in a competitive game without any central coordination.

下载PDF全文

下载文献需遵守相关版权规定

论文标题