COPE：端到端可训练的恒定运行时对象姿势估计

论文标题

COPE：端到端可训练的恒定运行时对象姿势估计

COPE: End-to-end trainable Constant Runtime Object Pose Estimation

论文作者

Thalhammer, Stefan, Patten, Timothy, Vincze, Markus

论文摘要

最先进的对象拟合估计通过使用多模型公式来处理测试图像中的多个实例：检测作为第一阶段，然后每个对象单独训练的网络以2d-3d几何对应关系预测作为第二阶段。随后，使用Perspective-N-points算法在运行时估算姿势。不幸的是，多模型配方很慢，并且与所涉及的对象实例的数量相比不能很好地扩展。最近的方法表明，直接6D对象姿势估计是可行的。我们提出了一种学习多个对象的中间几何表示，以直接回归测试图像中所有实例的6D姿势。固有的端到端训练性克服了单独处理单个对象实例的要求。通过计算相互交叉的联合会，将姿势假设聚集在不同的实例中，从而相对于对象实例的数量实现了可忽略不计的运行时开销。在多个具有挑战性的标准数据集中的结果表明，尽管姿势估计的性能要快于35倍以上，但姿势估计的性能优于单模最先进的方法。我们还提供了一个分析，显示存在90多个对象实例的图像实时适用性（> 24 fps）。进一步的结果表明，用6D姿势监督基于几何相应的对象姿势估计的优势。

State-of-the-art object pose estimation handles multiple instances in a test image by using multi-model formulations: detection as a first stage and then separately trained networks per object for 2D-3D geometric correspondence prediction as a second stage. Poses are subsequently estimated using the Perspective-n-Points algorithm at runtime. Unfortunately, multi-model formulations are slow and do not scale well with the number of object instances involved. Recent approaches show that direct 6D object pose estimation is feasible when derived from the aforementioned geometric correspondences. We present an approach that learns an intermediate geometric representation of multiple objects to directly regress 6D poses of all instances in a test image. The inherent end-to-end trainability overcomes the requirement of separately processing individual object instances. By calculating the mutual Intersection-over-Unions, pose hypotheses are clustered into distinct instances, which achieves negligible runtime overhead with respect to the number of object instances. Results on multiple challenging standard datasets show that the pose estimation performance is superior to single-model state-of-the-art approaches despite being more than ~35 times faster. We additionally provide an analysis showing real-time applicability (>24 fps) for images where more than 90 object instances are present. Further results show the advantage of supervising geometric-correspondence-based object pose estimation with the 6D pose.

下载PDF全文

下载文献需遵守相关版权规定

论文标题