零拍摄类别级对象姿势估计

论文标题

零拍摄类别级对象姿势估计

Zero-Shot Category-Level Object Pose Estimation

论文作者

Goodwin, Walter, Vaze, Sagar, Havoutis, Ioannis, Posner, Ingmar

论文摘要

物体姿势估计是大多数视觉管道的重要组成部分，以及在3D视觉中更普遍的视觉估计。在本文中，我们解决了以零拍的方式估算新物体类别姿势的问题。这通过消除了对姿势标记的数据集或特定于类别的CAD模型进行培训或推理的需求，从而扩展了许多现有文献。具体来说，我们做出以下贡献。首先，我们将零射击，类别级姿势估计问题进行形式化，并以最适用于现实世界体现的代理的方式进行构架。其次，我们提出了一种基于自我监督视力变压器的语义对应关系的新方法，以解决姿势估计问题。我们进一步重新使用了最近的CO3D数据集，以提出一个受控且现实的测试设置。最后，我们证明了我们提议的任务的所有基线的表现都很差，并表明我们的方法在30度时的平均旋转精度可提高六倍。我们的代码可在https://github.com/applied-ai-lab/zero-shot-pose上找到。

Object pose estimation is an important component of most vision pipelines for embodied agents, as well as in 3D vision more generally. In this paper we tackle the problem of estimating the pose of novel object categories in a zero-shot manner. This extends much of the existing literature by removing the need for pose-labelled datasets or category-specific CAD models for training or inference. Specifically, we make the following contributions. First, we formalise the zero-shot, category-level pose estimation problem and frame it in a way that is most applicable to real-world embodied agents. Secondly, we propose a novel method based on semantic correspondences from a self-supervised vision transformer to solve the pose estimation problem. We further re-purpose the recent CO3D dataset to present a controlled and realistic test setting. Finally, we demonstrate that all baselines for our proposed task perform poorly, and show that our method provides a six-fold improvement in average rotation accuracy at 30 degrees. Our code is available at https://github.com/applied-ai-lab/zero-shot-pose.

下载PDF全文

下载文献需遵守相关版权规定

论文标题