论文标题
可见的热无人机跟踪:大规模的基准和新的基线
Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
论文作者
论文摘要
随着多模式传感器的普及,可见的热(RGB-T)对象跟踪是在对象温度信息的指导下实现强大的性能和更广泛的应用程序场景。但是,缺乏配对训练样本是解锁RGB-T跟踪功能的主要瓶颈。由于收集高质量的RGB-T序列很费力,因此最近的基准仅提供测试序列。在本文中,我们构建了一个具有高度多样性的大规模基准,用于可见的无人机跟踪(VTUAV),其中包括500个序列,其中有170万个高分辨率(1920年$ \ times $ 1080 $ 1080像素)框架对。此外,考虑详尽的评估,考虑了各种类别和场景的综合应用(短期跟踪,长期跟踪和分割掩码预测)。此外,我们提供了一个粗到十的属性注释,其中提供了框架级属性来利用特定于挑战的跟踪器的潜力。此外,我们设计了一个新的RGB-T基线,称为层次多模式融合跟踪器(HMFT),该基线将RGB-T数据融合在各个级别中。进行了几个数据集上的许多实验,以揭示HMFT的有效性和不同融合类型的补充。该项目可在此处找到。
With the popularity of multi-modal sensors, visible-thermal (RGB-T) object tracking is to achieve robust performance and wider application scenarios with the guidance of objects' temperature information. However, the lack of paired training samples is the main bottleneck for unlocking the power of RGB-T tracking. Since it is laborious to collect high-quality RGB-T sequences, recent benchmarks only provide test sequences. In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV), including 500 sequences with 1.7 million high-resolution (1920 $\times$ 1080 pixels) frame pairs. In addition, comprehensive applications (short-term tracking, long-term tracking and segmentation mask prediction) with diverse categories and scenes are considered for exhaustive evaluation. Moreover, we provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers. In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels. Numerous experiments on several datasets are conducted to reveal the effectiveness of HMFT and the complement of different fusion types. The project is available at here.