论文标题
通过算法对齐方式朝着样品有效的代理
Towards Sample Efficient Agents through Algorithmic Alignment
论文作者
论文摘要
在这项工作中,我们建议并探索深图值网络(DEEPGV),是一种有前途的方法,可以使用消息通知机制在深钢筋学习剂中处理样品复杂性。主要思想是,应通过结构化的非神经网络算法(如动态编程)来指导代理。根据算法一致性的最新进展,具有结构化计算程序的神经网络可以有效地训练。我们通过证明深图值网络可以通过解决马尔可夫决策过程(MDP)的巨大余量来胜过非组织的基线,从而证明了图神经网络在支持样本有效学习方面的潜力。我们认为,这将为结构化代理设计开辟新的途径。请参阅https://github.com/drmeerkat/deep-graph-value-network有关代码。
In this work, we propose and explore Deep Graph Value Network (DeepGV) as a promising method to work around sample complexity in deep reinforcement-learning agents using a message-passing mechanism. The main idea is that the agent should be guided by structured non-neural-network algorithms like dynamic programming. According to recent advances in algorithmic alignment, neural networks with structured computation procedures can be trained efficiently. We demonstrate the potential of graph neural network in supporting sample efficient learning by showing that Deep Graph Value Network can outperform unstructured baselines by a large margin in solving the Markov Decision Process (MDP). We believe this would open up a new avenue for structured agent design. See https://github.com/drmeerkat/Deep-Graph-Value-Network for the code.