论文标题
宗旨:视频显着对象检测的三重激发网络
TENet: Triple Excitation Network for Video Salient Object Detection
论文作者
论文摘要
在本文中,我们提出了一种简单而有效的方法,称为“三重激发网络”,以从三个方面,空间,时间和在线激发训练视频显着对象检测(VSOD)。这些激发机制是按照课程学习精神设计的,旨在通过使用地面真理有选择性地激发特征激活在培训开始时减少学习歧义。然后,我们逐渐通过课程率逐渐减少地面真理激发的重量,并用课程互补图代替,以获得更好,更快的收敛。特别是,空间激发增强了清晰的物体边界的特征激活,而时间激发则施加了强调时空显着区域的动力。空间和时间激发可以打击VSOD的空间和时间特征之间的显着性转移问题以及冲突。此外,我们的半锻炼学习设计可以为VSOD制定第一个在线精炼策略,这可以在不重新训练的情况下进行测试时令人兴奋的显着性响应。提出的三重激发可以轻松地插入不同的VSOD方法。广泛的实验显示了所有三种激发方法的有效性,而所提出的方法的表现优于最先进的图像和视频显着对象检测方法。
In this paper, we propose a simple yet effective approach, named Triple Excitation Network, to reinforce the training of video salient object detection (VSOD) from three aspects, spatial, temporal, and online excitations. These excitation mechanisms are designed following the spirit of curriculum learning and aim to reduce learning ambiguities at the beginning of training by selectively exciting feature activations using ground truth. Then we gradually reduce the weight of ground truth excitations by a curriculum rate and replace it by a curriculum complementary map for better and faster convergence. In particular, the spatial excitation strengthens feature activations for clear object boundaries, while the temporal excitation imposes motions to emphasize spatio-temporal salient regions. Spatial and temporal excitations can combat the saliency shifting problem and conflict between spatial and temporal features of VSOD. Furthermore, our semi-curriculum learning design enables the first online refinement strategy for VSOD, which allows exciting and boosting saliency responses during testing without re-training. The proposed triple excitations can easily plug in different VSOD methods. Extensive experiments show the effectiveness of all three excitation methods and the proposed method outperforms state-of-the-art image and video salient object detection methods.