了解您的周围环境：利用场景信息进行对象跟踪

论文标题

了解您的周围环境：利用场景信息进行对象跟踪

Know Your Surroundings: Exploiting Scene Information for Object Tracking

论文作者

Bhat, Goutam, Danelljan, Martin, Van Gool, Luc, Timofte, Radu

论文摘要

当前的最新跟踪器仅依赖目标外观模型，以便将对象定位在每个帧中。但是，这种方法很容易失败。快速的外观变化或分散物体的存在，仅目标外观模型就不足以进行稳健的跟踪。在这种情况下，了解周围场景中其他物体的存在和位置的知识可能非常有益。该场景信息可以通过序列传播，例如明确避免分心对象并消除目标候选区域。在这项工作中，我们提出了一种新颖的跟踪体系结构，可以利用场景信息进行跟踪。我们的跟踪器表示诸如密集的局部状态向量之类的信息，例如，如果局部区域是目标，背景或干扰器，则可以编码。这些状态向量通过序列传播，并与外观模型输出结合以定位目标。学会了我们的网络通过直接在视频段上最大化跟踪性能来有效地利用场景信息。拟议的方法为3个跟踪基准设定了新的最先进的方法，在最近的GOT-10K数据集中，AO得分为63.6％。

Current state-of-the-art trackers only rely on a target appearance model in order to localize the object in each frame. Such approaches are however prone to fail in case of e.g. fast appearance changes or presence of distractor objects, where a target appearance model alone is insufficient for robust tracking. Having the knowledge about the presence and locations of other objects in the surrounding scene can be highly beneficial in such cases. This scene information can be propagated through the sequence and used to, for instance, explicitly avoid distractor objects and eliminate target candidate regions. In this work, we propose a novel tracking architecture which can utilize scene information for tracking. Our tracker represents such information as dense localized state vectors, which can encode, for example, if the local region is target, background, or distractor. These state vectors are propagated through the sequence and combined with the appearance model output to localize the target. Our network is learned to effectively utilize the scene information by directly maximizing tracking performance on video segments. The proposed approach sets a new state-of-the-art on 3 tracking benchmarks, achieving an AO score of 63.6% on the recent GOT-10k dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题