论文标题
超越海军网络:连续环境中的视觉和语言导航
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments
论文作者
论文摘要
我们在连续的3D环境中开发了一个语言引导的导航任务,代理必须执行低级操作以遵循自然语言导航方向。通过位于连续的环境中,此设置可以提取以前工作中隐含的许多假设,这些假设将环境表示为稀疏的全景图,边缘具有与导航性相对应的边缘。具体而言,我们的设置丢弃了已知的环境拓扑,短距离甲骨文导航和完美代理定位的推定。为了使这项新任务进行上下文化,我们开发了模型,这些模型反映了许多在先前的设置和单模式基准中取得的进步。尽管其中一些技术转移,但我们发现在连续环境中的绝对性能明显降低 - 表明在先前的“导航式图形”设置中的性能可能会被强烈的隐式假设膨胀。
We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions. By being situated in continuous environments, this setting lifts a number of assumptions implicit in prior work that represents environments as a sparse graph of panoramas with edges corresponding to navigability. Specifically, our setting drops the presumptions of known environment topologies, short-range oracle navigation, and perfect agent localization. To contextualize this new task, we develop models that mirror many of the advances made in prior settings as well as single-modality baselines. While some of these techniques transfer, we find significantly lower absolute performance in the continuous setting -- suggesting that performance in prior `navigation-graph' settings may be inflated by the strong implicit assumptions.