论文标题
尖峰Q学习深度加强学习
Deep Reinforcement Learning with Spiking Q-learning
论文作者
论文摘要
借助特殊的神经形态硬件,尖峰神经网络(SNN)有望实现人工智能(AI),能量消耗较少。它通过将SNN与深度强化学习(RL)相结合,为现实控制任务提供了一种有希望的节能方式。目前只有几种现有的基于SNN的RL方法。他们中的大多数要么缺乏概括能力,要么使用人工神经网络(ANN)来估计培训中的价值功能。前者需要为每种情况调整大量的超参数,后者限制了不同类型的RL算法的应用,而忽略了训练中大量的能源消耗。 To develop a robust spike-based RL method, we draw inspiration from non-spiking interneurons found in insects and propose the deep spiking Q-network (DSQN), using the membrane voltage of non-spiking neurons as the representation of Q-value, which can directly learn robust policies from high-dimensional sensory inputs using end-to-end RL.在Atari游戏中进行的实验表明,DSQN有效,甚至在大多数游戏中都超过了基于ANN的Deep Q-Network(DQN)。此外,实验显示出对DSQN的对抗性攻击的出色学习稳定性和鲁棒性。
With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (RL). There are only a few existing SNN-based RL methods at present. Most of them either lack generalization ability or employ Artificial Neural Networks (ANNs) to estimate value function in training. The former needs to tune numerous hyper-parameters for each scenario, and the latter limits the application of different types of RL algorithm and ignores the large energy consumption in training. To develop a robust spike-based RL method, we draw inspiration from non-spiking interneurons found in insects and propose the deep spiking Q-network (DSQN), using the membrane voltage of non-spiking neurons as the representation of Q-value, which can directly learn robust policies from high-dimensional sensory inputs using end-to-end RL. Experiments conducted on 17 Atari games demonstrate the DSQN is effective and even outperforms the ANN-based deep Q-network (DQN) in most games. Moreover, the experiments show superior learning stability and robustness to adversarial attacks of DSQN.