论文标题

社交网络结构塑造创新:与Sapiens在RL中共享体验

Social Network Structure Shapes Innovation: Experience-sharing in RL with SAPIENS

论文作者

Nisioti, Eleni, Mahaut, Mateo, Oudeyer, Pierre-Yves, Momennejad, Ida, Moulin-Frier, Clément

论文摘要

人类文化依赖于创新:我们不断探索如何将现有元素组合起来创建新元素的能力。创新不是孤独的,它依赖于集体搜索和积累。加强学习(RL)方法通常假定完全连接的群体最适合创​​新。但是,人类实验室和现场研究表明,动态的社交网络结构可以实现层次创新。在动态的环境中,人类在单独或小簇中进行创新,然后与他人分享结果。据我们所知,社交网络结构在创新中的作用尚未在RL中系统地研究。在这里,我们使用多级问题设置(WordCraft),其中包含三个不同的创新任务来测试社交网络结构影响分布式RL算法的性能的假设。我们系统地设计了DQNS共享经验的网络,从它们的重播缓冲区中分享了不同的结构(完全连接,小世界,动态,戒指),并介绍了一套行为和狂热的指标,这些指标扩展了RL的经典奖励评估框架。比较不同任务的不同社交网络结构所达到的创新水平,这首先与人类的发现一致,在动态结构中共享的经验可以实现具有欺骗性性质和较大搜索空间的任务中最高水平的创新水平。其次,当有一条清晰的创新途径时,经验共享并不那么有用。第三,我们提出的指标可以帮助了解不同任务上不同社交网络结构的成功,以及个人和团队级别贷款至关重要的见解的多样性。

Human culture relies on innovation: our ability to continuously explore how existing elements can be combined to create new ones. Innovation is not solitary, it relies on collective search and accumulation. Reinforcement learning (RL) approaches commonly assume that fully-connected groups are best suited for innovation. However, human laboratory and field studies have shown that hierarchical innovation is more robustly achieved by dynamic social network structures. In dynamic settings, humans oscillate between innovating individually or in small clusters, and then sharing outcomes with others. To our knowledge, the role of social network structure on innovation has not been systematically studied in RL. Here, we use a multi-level problem setting (WordCraft), with three different innovation tasks to test the hypothesis that the social network structure affects the performance of distributed RL algorithms. We systematically design networks of DQNs sharing experiences from their replay buffers in varying structures (fully-connected, small world, dynamic, ring) and introduce a set of behavioral and mnemonic metrics that extend the classical reward-focused evaluation framework of RL. Comparing the level of innovation achieved by different social network structures across different tasks shows that, first, consistent with human findings, experience sharing within a dynamic structure achieves the highest level of innovation in tasks with a deceptive nature and large search spaces. Second, experience sharing is not as helpful when there is a single clear path to innovation. Third, the metrics we propose, can help understand the success of different social network structures on different tasks, with the diversity of experiences on an individual and group level lending crucial insights.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源