论文标题
MaskGru:在存在大型背景运动的情况下跟踪小物体
maskGRU: Tracking Small Objects in the Presence of Large Background Motions
论文作者
论文摘要
我们提出了一个基于神经网络的经常性时空框架,名为MaskGru,用于在视频中检测和跟踪小物体。尽管近年来,在物体跟踪领域有许多发展,但在其他移动的物体和演员(例如运动录像中移动的球员)仍然是一项艰巨的任务中,跟踪一个小的移动物体(例如,在运动中的球员中)是一项艰巨的任务。现有的时空网络,例如卷积的封闭式复发单元(Convrus),难以训练,并且在这种情况下很难准确地跟踪小物体。为了克服这些困难,我们开发了MaskGru框架,该框架使用了由Convru产生的内部隐藏状态的加权和,该框架和该轨道对象预测的边界框的3通道掩码作为在基础Convru的下一个时间步骤中使用的隐藏状态。我们认为,通过加权和通过加权总和将面具纳入隐藏状态的技术有两个好处:控制爆炸梯度的效果,并通过指示对象的上一个视频框架中的位置,将注意力般的机制引入网络。我们的实验表明,即使在存在其他移动对象的情况下,MaskGru在跟踪相对于视频分辨率较小的对象方面的表现都优于Convru。
We propose a recurrent neural network-based spatio-temporal framework named maskGRU for the detection and tracking of small objects in videos. While there have been many developments in the area of object tracking in recent years, tracking a small moving object amid other moving objects and actors (such as a ball amid moving players in sports footage) continues to be a difficult task. Existing spatio-temporal networks, such as convolutional Gated Recurrent Units (convGRUs), are difficult to train and have trouble accurately tracking small objects under such conditions. To overcome these difficulties, we developed the maskGRU framework that uses a weighted sum of the internal hidden state produced by a convGRU and a 3-channel mask of the tracked object's predicted bounding box as the hidden state to be used at the next time step of the underlying convGRU. We believe the technique of incorporating a mask into the hidden state through a weighted sum has two benefits: controlling the effect of exploding gradients and introducing an attention-like mechanism into the network by indicating where in the previous video frame the object is located. Our experiments show that maskGRU outperforms convGRU at tracking objects that are small relative to the video resolution even in the presence of other moving objects.