迭代囚犯的困境中的在线学习，以模仿人类行为

论文标题

迭代囚犯的困境中的在线学习，以模仿人类行为

Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior

论文作者

Lin, Baihan, Bouneffouf, Djallel, Cecchi, Guillermo

论文摘要

作为一个重要的心理和社会实验，迭代的囚犯困境（IPD）将选择或缺陷作为原子行动视为选择。我们建议研究迭代的囚犯困境（IPD）游戏中在线学习算法的行为，在那里我们研究了整个强化学习剂：多臂匪徒，上下文的强盗和钢筋学习。我们根据迭代囚犯的困境的比赛进行评估，其中多个特工可以以依次的方式竞争。这使我们能够分析由多个自私的独立奖励驱动的代理所学到的政策的动态，还使我们研究了这些算法适合人类行为的能力。结果表明，考虑到当前情况做出决定是这种社会困境游戏中最糟糕的情况。陈述了关于在线学习行为和临床验证的倍数，以将人工智能算法与人类行为及其异常状态联系起来。

As an important psychological and social experiment, the Iterated Prisoner's Dilemma (IPD) treats the choice to cooperate or defect as an atomic action. We propose to study the behaviors of online learning algorithms in the Iterated Prisoner's Dilemma (IPD) game, where we investigate the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics of policies learned by multiple self-interested independent reward-driven agents, and also allows us study the capacity of these algorithms to fit the human behaviors. Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game. Multiples discoveries on online learning behaviors and clinical validations are stated, as an effort to connect artificial intelligence algorithms with human behaviors and their abnormal states in neuropsychiatric conditions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题