视觉预训练进行导航：我们可以从噪声中学到什么？

论文标题

视觉预训练进行导航：我们可以从噪声中学到什么？

Visual Pre-training for Navigation: What Can We Learn from Noise?

论文作者

Wang, Yanwei, Ko, Ching-Yun, Agrawal, Pulkit

论文摘要

视觉导航中的一个强大的范式是直接从观察值中预测动作。训练这样的端到端系统使表示对下游任务有用的表示形式自动出现。但是，缺乏归纳偏差使该系统数据效率低下。我们假设当前视图的足够表示，可以通过预测与目标相对应的当前视图的作物的位置和大小来了解导航策略的目标视图。我们进一步表明，以自我监督的方式训练这种随机的作物预测，这纯粹是在合成噪声图像上很好地转移到自然家庭图像上的。然后，可以通过很少的交互数据进行自动研究所学的表示形式，以有效地学习导航策略。该代码可在https://yanweiw.github.io/noise2ptz上找到

One powerful paradigm in visual navigation is to predict actions from observations directly. Training such an end-to-end system allows representations useful for downstream tasks to emerge automatically. However, the lack of inductive bias makes this system data inefficient. We hypothesize a sufficient representation of the current view and the goal view for a navigation policy can be learned by predicting the location and size of a crop of the current view that corresponds to the goal. We further show that training such random crop prediction in a self-supervised fashion purely on synthetic noise images transfers well to natural home images. The learned representation can then be bootstrapped to learn a navigation policy efficiently with little interaction data. The code is available at https://yanweiw.github.io/noise2ptz

下载PDF全文

下载文献需遵守相关版权规定

论文标题