通过增强学习模仿进化

论文标题

通过增强学习模仿进化

Mimicking Evolution with Reinforcement Learning

论文作者

Abrantes, João P., Abrantes, Arnaldo J., Oliehoek, Frans A.

论文摘要

进化引起了地球上的人类和动物智能。我们认为，发展人为人类的智力的途径将通过模仿自然样的模拟中的进化过程。在自然界中，有两个过程推动了大脑的发展：进化和学习。进化的行为缓慢，几代人以及其他事项，它通过更改其内部奖励功能来定义代理商所学到的知识。学习在一生中很快，它可以迅速更新代理商的政策，以最大程度地提高愉悦感和最大程度地减少痛苦。奖励函数与进化的适应性函数缓慢保持一致，但是，随着代理的进化环境及其适应性函数也会改变，从而增加了奖励和健身之间的错位。在模拟中复制这两个过程在计算上非常昂贵。这项工作提出了通过进化奖励（永远）的进化，该奖励允许学习通过确保奖励功能与健身函数的一致性来单枪匹马地推动对政策的搜索。在此搜索中，曾经利用代理商一生的整个状态行动轨迹。相反，当前的进化算法丢弃了此信息，因此限制了它们在解决顺序决策问题时的潜在效率。我们在两个简单的生物启发环境中测试算法，并在与最先进的进化算法相比生存和再现其基因时表现出了更高的能力。

Evolution gave rise to human and animal intelligence here on Earth. We argue that the path to developing artificial human-like-intelligence will pass through mimicking the evolutionary process in a nature-like simulation. In Nature, there are two processes driving the development of the brain: evolution and learning. Evolution acts slowly, across generations, and amongst other things, it defines what agents learn by changing their internal reward function. Learning acts fast, across one's lifetime, and it quickly updates agents' policy to maximise pleasure and minimise pain. The reward function is slowly aligned with the fitness function by evolution, however, as agents evolve the environment and its fitness function also change, increasing the misalignment between reward and fitness. It is extremely computationally expensive to replicate these two processes in simulation. This work proposes Evolution via Evolutionary Reward (EvER) that allows learning to single-handedly drive the search for policies with increasingly evolutionary fitness by ensuring the alignment of the reward function with the fitness function. In this search, EvER makes use of the whole state-action trajectories that agents go through their lifetime. In contrast, current evolutionary algorithms discard this information and consequently limit their potential efficiency at tackling sequential decision problems. We test our algorithm in two simple bio-inspired environments and show its superiority at generating more capable agents at surviving and reproducing their genes when compared with a state-of-the-art evolutionary algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题