论文标题
右旋:从旋转补偿流场学习对象运动
The Right Spin: Learning Object Motion from Rotation-Compensated Flow Fields
论文作者
论文摘要
对几何概念的良好理解和对物体的广泛熟悉都使我们对移动物体的出色看法。人类检测和段移动对象的能力在存在多个对象,复杂的背景几何形状,观察者的运动甚至伪装的能力中起作用。人类如何可靠地感知移动物体是计算机视觉中的一个长期研究问题,并从相关领域(例如心理学,认知科学和物理学)借用了发现。解决该问题的一种方法是教授一个深层网络,以建模所有这些效果。这与人类视野所使用的策略形成鲜明对比,在该策略中,认知过程和身体设计紧密耦合,每个人都负责正确识别运动对象的某些方面。同样,从计算机视觉的角度来看,有证据表明,基于几何的基于几何的技术更适合问题的“基于运动”的部分,而深网更适合建模外观。在这项工作中,我们认为相机旋转和摄像机翻译的耦合可以创建复杂的运动场,对于深层网络而言,这很难直接解开。我们提出了一个新颖的概率模型,以估计鉴于运动场的镜头旋转。然后,我们纠正流场以获得旋转补偿的运动场以进行后续分割。首次估算摄像头运动的策略,然后允许网络学习问题的其余部分,从而在广泛使用的戴维斯基准以及最近发表的运动分割数据集MOCA(移动迷彩动物)上产生了改进的结果。
Both a good understanding of geometrical concepts and a broad familiarity with objects lead to our excellent perception of moving objects. The human ability to detect and segment moving objects works in the presence of multiple objects, complex background geometry, motion of the observer and even camouflage. How humans perceive moving objects so reliably is a longstanding research question in computer vision and borrows findings from related areas such as psychology, cognitive science and physics. One approach to the problem is to teach a deep network to model all of these effects. This contrasts with the strategy used by human vision, where cognitive processes and body design are tightly coupled and each is responsible for certain aspects of correctly identifying moving objects. Similarly from the computer vision perspective, there is evidence that classical, geometry-based techniques are better suited to the "motion-based" parts of the problem, while deep networks are more suitable for modeling appearance. In this work, we argue that the coupling of camera rotation and camera translation can create complex motion fields that are difficult for a deep network to untangle directly. We present a novel probabilistic model to estimate the camera's rotation given the motion field. We then rectify the flow field to obtain a rotation-compensated motion field for subsequent segmentation. This strategy of first estimating camera motion, and then allowing a network to learn the remaining parts of the problem, yields improved results on the widely used DAVIS benchmark as well as the recently published motion segmentation data set MoCA (Moving Camouflaged Animals).