论文标题
视频对象细分的反复动态嵌入
Recurrent Dynamic Embedding for Video Object Segmentation
论文作者
论文摘要
基于时空内存(STM)的视频对象细分(VOS)网络通常每几个框架都会增加内存库,这表现出出色的性能。但是,1)硬件无法承受随着视频长度的增加而不断增加的内存需求。 2)存储大量信息不可避免地会引入许多噪音,这不利于阅读内存库中最重要的信息。在本文中,我们提出了一个反复的动态嵌入(RDE),以构建一个恒定大小的内存库。具体而言,我们通过提出的时空聚合模块(SAM)明确生成和更新RDE,该模块利用了历史信息的提示。为了避免由于SAM的复发使用而累积错误,我们提出了在训练阶段的无偏指导损失,这使SAM在长时间的视频中更加健壮。此外,由于网络推断不准确,内存库中预测的面具不准确,这会影响查询框架的分割。为了解决这个问题,我们设计了一种新颖的自我纠正策略,以便网络可以修复记忆库中具有不同品质的口罩的嵌入。广泛的实验表明我们的方法实现了性能和速度之间的最佳折衷。代码可在https://github.com/limingxing00/rde-vos-cvpr2r2022上找到。
Space-time memory (STM) based video object segmentation (VOS) networks usually keep increasing memory bank every several frames, which shows excellent performance. However, 1) the hardware cannot withstand the ever-increasing memory requirements as the video length increases. 2) Storing lots of information inevitably introduces lots of noise, which is not conducive to reading the most important information from the memory bank. In this paper, we propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. Specifically, we explicitly generate and update RDE by the proposed Spatio-temporal Aggregation Module (SAM), which exploits the cue of historical information. To avoid error accumulation owing to the recurrent usage of SAM, we propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. Moreover, the predicted masks in the memory bank are inaccurate due to the inaccurate network inference, which affects the segmentation of the query frame. To address this problem, we design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank. Extensive experiments show our method achieves the best tradeoff between performance and speed. Code is available at https://github.com/Limingxing00/RDE-VOS-CVPR2022.