使用频域变压器网络进行运动分割

论文标题

使用频域变压器网络进行运动分割

Motion Segmentation using Frequency Domain Transformer Networks

论文作者

Farazi, Hafez, Behnke, Sven

论文摘要

自我监督的预测是学习捕获数据基础结构的表示表示的有力机制。尽管最近取得了进展，但自我监督的视频预测任务仍然具有挑战性。使任务艰难的关键因素之一是运动分割，它正在分割单个对象和背景并分别估算其运动。在视频预测中，应仅通过预测像素空间中的下一个帧来理解每个对象的形状，外观和转换。为了解决此任务，我们提出了一种新颖的端到端可学习体系结构，该体系结构通过分别对前景和背景进行建模，同时使用频域变压器网络对前景运动进行估算和预测前景运动。实验评估表明，这会产生可解释的表示，并且我们的方法可以胜过一些广泛使用的视频预测方法，例如视频阶梯网络和关于合成数据的预测性门控金字塔。

Self-supervised prediction is a powerful mechanism to learn representations that capture the underlying structure of the data. Despite recent progress, the self-supervised video prediction task is still challenging. One of the critical factors that make the task hard is motion segmentation, which is segmenting individual objects and the background and estimating their motion separately. In video prediction, the shape, appearance, and transformation of each object should be understood only by predicting the next frame in pixel space. To address this task, we propose a novel end-to-end learnable architecture that predicts the next frame by modeling foreground and background separately while simultaneously estimating and predicting the foreground motion using Frequency Domain Transformer Networks. Experimental evaluations show that this yields interpretable representations and that our approach can outperform some widely used video prediction methods like Video Ladder Network and Predictive Gated Pyramids on synthetic data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题