在教导强化学习者时，使用机器教学来调查人类的假设

论文标题

在教导强化学习者时，使用机器教学来调查人类的假设

Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners

论文作者

Chuang, Yun-Shiuan, Zhang, Xuezhou, Ma, Yuzhe, Ho, Mark K., Austerweil, Joseph L., Zhu, Xiaojin

论文摘要

成功的教学需要假设学习者如何学习 - 学习者如何利用世界上的经验来更新其内部状态。我们调查人们对学习者对学习者以在线方式进行奖励和惩罚时对他们的期望。我们专注于一种常见的增强学习方法，Q学习，并检查人们使用行为实验的假设。为此，我们首先通过将问题作为机器教学优化问题制定来建立规范标准。为了解决机器教学优化问题，我们使用一种深度学习近似方法，该方法模拟了环境中的学习者，并学习预测反馈如何影响学习者的内部状态。人们在教导他们一项理想化的探索探索任务时，他们对学习者的学习和折现率有什么想法？在一个行为实验中，我们发现，当学习者对折现率使用较小的价值，并以其学习率的较大价值，人们可以以相对有效的方式向Q-Learners教任务。但是，它们仍然是最佳的。我们还发现，为人们提供实时更新，以了解反馈将如何影响Q-Learner的内部状态，从而微弱地帮助他们教导。我们的结果揭示了人们如何使用评估反馈教学，并为工程师应如何以直观的方式设计机器代理提供指导。

Successful teaching requires an assumption of how the learner learns - how the learner uses experiences from the world to update their internal states. We investigate what expectations people have about a learner when they teach them in an online manner using rewards and punishment. We focus on a common reinforcement learning method, Q-learning, and examine what assumptions people have using a behavioral experiment. To do so, we first establish a normative standard, by formulating the problem as a machine teaching optimization problem. To solve the machine teaching optimization problem, we use a deep learning approximation method which simulates learners in the environment and learns to predict how feedback affects the learner's internal states. What do people assume about a learner's learning and discount rates when they teach them an idealized exploration-exploitation task? In a behavioral experiment, we find that people can teach the task to Q-learners in a relatively efficient and effective manner when the learner uses a small value for its discounting rate and a large value for its learning rate. However, they still are suboptimal. We also find that providing people with real-time updates of how possible feedback would affect the Q-learner's internal states weakly helps them teach. Our results reveal how people teach using evaluative feedback and provide guidance for how engineers should design machine agents in a manner that is intuitive for people.

下载PDF全文

下载文献需遵守相关版权规定

论文标题