论文标题

通过视觉辅助群集排名进行时间效率的奖励学习

Time-Efficient Reward Learning via Visually Assisted Cluster Ranking

论文作者

Zhang, David, Carroll, Micah, Bobu, Andreea, Dragan, Anca

论文摘要

奖励学习最成功的范式之一,以比较形式使用人类反馈。尽管这些方法有希望,但人类的比较标签既昂贵又耗时,这构成了其更广泛的适用性的主要瓶颈。我们的见解是,我们可以通过将比较一起分散比较而不是将每个比较单独进行比较来大大改善在这些方法中如何有效地使用人类的时间。为此,我们利用数据维度降低和可视化技术为人类提供交互式GUI显示状态空间,用户可以在其中标记状态空间的子群。在一些简单的Mujoco任务中,我们表明,这种高级方法具有前途,并且能够大大提高所得代理的性能,提供相同数量的人类标记时间。

One of the most successful paradigms for reward learning uses human feedback in the form of comparisons. Although these methods hold promise, human comparison labeling is expensive and time consuming, constituting a major bottleneck to their broader applicability. Our insight is that we can greatly improve how effectively human time is used in these approaches by batching comparisons together, rather than having the human label each comparison individually. To do so, we leverage data dimensionality-reduction and visualization techniques to provide the human with a interactive GUI displaying the state space, in which the user can label subportions of the state space. Across some simple Mujoco tasks, we show that this high-level approach holds promise and is able to greatly increase the performance of the resulting agents, provided the same amount of human labeling time.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源