论文标题
使用逆增强学习预测目标指导的人类注意力
Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning
论文作者
论文摘要
能够预测人类的凝视行为对于行为视觉和计算机视觉应用至关重要。大多数模型主要专注于使用显着图预测自由观看行为,但是这些预测并未推广到目标指导的行为,例如当一个人搜索视觉目标对象时。我们提出了第一个逆增强学习(IRL)模型,以学习人类在视觉搜索过程中使用的内部奖励功能和政策。观众的内部信念状态被建模为动态的对象位置的信念图。这些地图是通过IRL学习的,然后用于预测多个目标类别的行为扫描路径。为了训练和评估我们的IRL模型,我们创建了Coco-Search18,该模型现在是现有的最大高质量搜索固定数据集。 Coco-Search18有10位参与者在6202张图像中搜索18个目标对象类别中的每一个,从而制造了大约300,000个目标定向的固定装置。在对可可搜索的培训和评估时,IRL模型在预测搜索固定扫描路径方面的表现优于基线模型,无论是在与人类搜索行为和搜索效率的相似性方面。最后,IRL模型恢复的奖励图揭示了对象优先级的独特目标依赖性模式,我们将其解释为学习的对象上下文。
Being able to predict human gaze behavior has obvious importance for behavioral vision and for computer vision applications. Most models have mainly focused on predicting free-viewing behavior using saliency maps, but these predictions do not generalize to goal-directed behavior, such as when a person searches for a visual target object. We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search. The viewer's internal belief states were modeled as dynamic contextual belief maps of object locations. These maps were learned by IRL and then used to predict behavioral scanpaths for multiple target categories. To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence. COCO-Search18 has 10 participants searching for each of 18 target-object categories in 6202 images, making about 300,000 goal-directed fixations. When trained and evaluated on COCO-Search18, the IRL model outperformed baseline models in predicting search fixation scanpaths, both in terms of similarity to human search behavior and search efficiency. Finally, reward maps recovered by the IRL model reveal distinctive target-dependent patterns of object prioritization, which we interpret as a learned object context.