论文标题
使用场景图的增量对象接地
Incremental Object Grounding Using Scene Graphs
论文作者
论文摘要
对象接地任务旨在通过口头通信将目标对象定位在图像中。了解人类命令是有效的人类机器人交流所需的重要过程。但是,这是具有挑战性的,因为人类命令可能是模棱两可和错误的。本文旨在通过允许代理商根据从场景图获得的语义数据提出相关问题来消除人的参考表达式。我们测试我们的代理是否可以使用场景图中的对象之间的关系来提出与原始用户命令歧义的语义相关问题。在本文中,我们使用场景图(IGSG)介绍了增量接地,这是一个歧义模型,该模型使用来自图像场景图的语义数据和语言场景图从人类命令到地面对象的语言结构。与基线相比,IGSG在有多个相同目标对象的复杂现实世界中显示出令人鼓舞的结果。 IGSG可以通过向用户提出歧义问题来有效地消除歧义或错误的转介表达式。
Object grounding tasks aim to locate the target object in an image through verbal communications. Understanding human command is an important process needed for effective human-robot communication. However, this is challenging because human commands can be ambiguous and erroneous. This paper aims to disambiguate the human's referring expressions by allowing the agent to ask relevant questions based on semantic data obtained from scene graphs. We test if our agent can use relations between objects from a scene graph to ask semantically relevant questions that can disambiguate the original user command. In this paper, we present Incremental Grounding using Scene Graphs (IGSG), a disambiguation model that uses semantic data from an image scene graph and linguistic structures from a language scene graph to ground objects based on human command. Compared to the baseline, IGSG shows promising results in complex real-world scenes where there are multiple identical target objects. IGSG can effectively disambiguate ambiguous or wrong referring expressions by asking disambiguating questions back to the user.