人类非理性对加强学习的含义

论文标题

人类非理性对加强学习的含义

Implications of Human Irrationality for Reinforcement Learning

论文作者

Chen, Haiyang, Chang, Hyung Jin, Howes, Andrew

论文摘要

行为科学的最新工作已经开始推翻长期以来的信念，即人类决策是不合理的，次优的，并且存在偏见。理性的这一转弯表明，人类决策可能是限制机器学习问题如何定义的更好的思想来源。一个有前途的想法涉及人类决策，这取决于选择背景的明显无关紧要的方面。先前的工作表明，通过考虑选择上下文并进行关系观察，人们可以最大程度地提高期望值。其他工作表明，部分可观察到的马尔可夫决策过程（POMDP）是制定类似人类决策问题的有用方法。在这里，我们提出了一个新颖的POMDP模型，用于上下文选择任务，并表明，尽管有明显的不合理性，但增强器学习者可以利用人类做出决定的方式。我们建议人类的非理性可能为改善AI体系结构和机器学习方法的设计提供有效的灵感来源。

Recent work in the behavioural sciences has begun to overturn the long-held belief that human decision making is irrational, suboptimal and subject to biases. This turn to the rational suggests that human decision making may be a better source of ideas for constraining how machine learning problems are defined than would otherwise be the case. One promising idea concerns human decision making that is dependent on apparently irrelevant aspects of the choice context. Previous work has shown that by taking into account choice context and making relational observations, people can maximize expected value. Other work has shown that Partially observable Markov decision processes (POMDPs) are a useful way to formulate human-like decision problems. Here, we propose a novel POMDP model for contextual choice tasks and show that, despite the apparent irrationalities, a reinforcement learner can take advantage of the way that humans make decisions. We suggest that human irrationalities may offer a productive source of inspiration for improving the design of AI architectures and machine learning methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题