探索用于不实体对象检测的时空聚合：基准数据集和基线

论文标题

探索用于不实体对象检测的时空聚合：基准数据集和基线

Explore Spatio-temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline

论文作者

Zhou, Kailai, Wang, Yibo, Lv, Tao, Li, Yunqian, Chen, Linsen, Shen, Qiu, Cao, Xun

论文摘要

我们努力努力探索的任务很少被称为Insubstantial对象检测（IOD），该任务旨在以以下特征来定位对象：（1）具有模糊边界的无定形形状；（2）与周围环境相似；（3）颜色不存在。因此，在单个静态框架中区分不变对象是更具挑战性的，而空间和时间信息的协作表示至关重要。因此，我们构建了一个由600个视频（141,017帧）组成的iod-video数据集，其中涵盖了不同频谱范围捕获的各种距离，尺寸，可见性和场景。此外，我们为IOD开发了一个时空聚合框架，其中部署了不同的骨架，并精心设计了时空聚集损失（Staloss），以利用沿时间轴的一致性来利用一致性。在IOD-VIDEO数据集上进行的实验表明，时空聚集可以显着改善IOD的性能。我们希望我们的工作能够吸引进一步的研究，以完成这项有价值但充满挑战的任务。该代码将在：\ url {https://github.com/calayzhou/iod-video}上可用。

We endeavor on a rarely explored task named Insubstantial Object Detection (IOD), which aims to localize the object with following characteristics: (1) amorphous shape with indistinct boundary; (2) similarity to surroundings; (3) absence in color. Accordingly, it is far more challenging to distinguish insubstantial objects in a single static frame and the collaborative representation of spatial and temporal information is crucial. Thus, we construct an IOD-Video dataset comprised of 600 videos (141,017 frames) covering various distances, sizes, visibility, and scenes captured by different spectral ranges. In addition, we develop a spatio-temporal aggregation framework for IOD, in which different backbones are deployed and a spatio-temporal aggregation loss (STAloss) is elaborately designed to leverage the consistency along the time axis. Experiments conducted on IOD-Video dataset demonstrate that spatio-temporal aggregation can significantly improve the performance of IOD. We hope our work will attract further researches into this valuable yet challenging task. The code will be available at: \url{https://github.com/CalayZhou/IOD-Video}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题