你在哪里？来自体现对话框的本地化

论文标题

你在哪里？来自体现对话框的本地化

Where Are You? Localization from Embodied Dialog

论文作者

Hahn, Meera, Krantz, Jacob, Batra, Dhruv, Parikh, Devi, Rehg, James M., Lee, Stefan, Anderson, Peter

论文摘要

我们介绍你在哪里？（方式），一个〜6K对话框的数据集，其中两个人（一个观察者和一个定位器）完成了合作本地化任务。观察者在3D环境中随机产生，并可以从第一人称视图中导航，同时回答定位器的问题。定位器必须通过提出问题和提供指示来将观察者定位在详细的自上而下地图中。基于此数据集，我们定义了三个具有挑战性的任务：来自体现的对话框或LED的本地化（从对话框历史记录中定位观察者），体现的视觉对话框（对观察者进行建模）和合作定位（对两个代理进行建模）。在本文中，我们专注于LED任务 - 提供了一个强大的基线模型，并具有表征数据集偏见和各种建模选择的重要性的详细消融。我们的最佳模型在未见建筑物中识别3M内的观察者位置方面取得了32.7％的成功，而人类定位者的位置为70.4％。

We present Where Are You? (WAY), a dataset of ~6k dialogs in which two humans -- an Observer and a Locator -- complete a cooperative localization task. The Observer is spawned at random in a 3D environment and can navigate from first-person views while answering questions from the Locator. The Locator must localize the Observer in a detailed top-down map by asking questions and giving instructions. Based on this dataset, we define three challenging tasks: Localization from Embodied Dialog or LED (localizing the Observer from dialog history), Embodied Visual Dialog (modeling the Observer), and Cooperative Localization (modeling both agents). In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices. Our best model achieves 32.7% success at identifying the Observer's location within 3m in unseen buildings, vs. 70.4% for human Locators.

下载PDF全文

下载文献需遵守相关版权规定

论文标题