从视频中对直觉物理学的闭塞耐药性学习

论文标题

从视频中对直觉物理学的闭塞耐药性学习

Occlusion resistant learning of intuitive physics from videos

论文作者

Riochet, Ronan, Sivic, Josef, Laptev, Ivan, Dupoux, Emmanuel

论文摘要

为了在复杂的任务上达到人类的绩效，人工系统的关键能力是了解对象之间的物理互动，并预测情况的未来结果。这种能力通常被称为直观物理，最近引起了人们的注意，并提出了几种方法来从视频序列中学习这些物理规则。但是，这些方法中的大多数仅限于发生NO（或仅受到限制）的情况。在这项工作中，我们提出了在3D场景中具有明显的对象间阻塞的3D场景中学习直觉物理学的概率表述。在我们的公式中，对象位置被建模为可实现场景重建的潜在变量。然后，我们提出了一系列近似值，以使此问题可以解决。对象建议使用经常性交互网络的组合在对象空间中建模物理和构图渲染器的组合链接，并将对象项目在像素空间上进行建模。我们在Intphys的直观物理基准中表现出了对最先进的显着改善。我们将方法应用于第二个数据集，其闭塞水平越来越高，这表明它实际上预测了将来的分段掩盖掩码，将来最多可以预测30帧。最后，我们还显示了预测真实视频中对象运动的结果。

To reach human performance on complex tasks, a key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences. Yet, most of these methods are restricted to the case where no, or only limited, occlusions occur. In this work we propose a probabilistic formulation of learning intuitive physics in 3D scenes with significant inter-object occlusions. In our formulation, object positions are modeled as latent variables enabling the reconstruction of the scene. We then propose a series of approximations that make this problem tractable. Object proposals are linked across frames using a combination of a recurrent interaction network, modeling the physics in object space, and a compositional renderer, modeling the way in which objects project onto pixel space. We demonstrate significant improvements over state-of-the-art in the intuitive physics benchmark of IntPhys. We apply our method to a second dataset with increasing levels of occlusions, showing it realistically predicts segmentation masks up to 30 frames in the future. Finally, we also show results on predicting motion of objects in real videos.

下载PDF全文

下载文献需遵守相关版权规定

论文标题