零摄像视频对象细分的运动指导过渡

论文标题

零摄像视频对象细分的运动指导过渡

Motion-Attentive Transition for Zero-Shot Video Object Segmentation

论文作者

Zhou, Tianfei, Wang, Shunzhou, Zhou, Yi, Yao, Yazhou, Li, Jianwu, Shao, Ling

论文摘要

在本文中，我们提出了一个用于零摄像视频对象分割的新型运动指导过渡网络（MATNET），该网络提供了一种利用运动信息来加强时空对象表示的新方法。在两流编码器中设计了一个不对称的注意力块，称为运动指导过渡（MAT），该编码器将外观特征转化为每个卷积阶段的运动训练表示。这样，编码器就会深入交织，从而使对象运动和外观之间进行了紧密的层次相互作用。这优于典型的两流体系结构，该体系结构可在每个流中分别处理运动和外观，并且通常不适合外观信息。此外，提出了一个桥网络，以获得多层编码器特征的紧凑，歧视性和比例敏感的表示，该特征将进一步馈送到解码器中以实现分割结果。对三个具有挑战性的公共基准（即Davis-16，FBM和YouTube-Objects）进行了广泛的实验表明，我们的模型对最先进的表现实现了令人信服的性能。

In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal object representation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder, which transforms appearance features into motion-attentive representations at each convolutional stage. In this way, the encoder becomes deeply interleaved, allowing for closely hierarchical interactions between object motion and appearance. This is superior to the typical two-stream architecture, which treats motion and appearance separately in each stream and often suffers from overfitting to appearance information. Additionally, a bridge network is proposed to obtain a compact, discriminative and scale-sensitive representation for multi-level encoder features, which is further fed into a decoder to achieve segmentation results. Extensive experiments on three challenging public benchmarks (i.e. DAVIS-16, FBMS and Youtube-Objects) show that our model achieves compelling performance against the state-of-the-arts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题