论文标题
跨模式协作表示学习和大规模的RGBT基准,用于人群计数
Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting
论文作者
论文摘要
人群计数是一项基本而又具有挑战性的任务,它希望丰富的信息能够产生像素的人群密度地图。但是,大多数以前的方法仅使用RGB图像的有限信息,并且无法在不受约束的情况下发现潜在的行人。在这项工作中,我们发现合并光学和热信息可以极大地帮助识别行人。为了促进该领域的未来研究,我们介绍了一个大规模的RGBT人群计数(RGBT-CC)基准,其中包含2,030对RGB热图像,带有138,389个注释人。此外,为了促进多模式人群的计数,我们提出了一个跨模式协作表示学习框架,该框架由多个特定于模态的分支,模态共享分支和信息聚合分布模块(IADM)组成,以完全捕获不同模态的互补信息。具体而言,我们的IADM结合了两个协作信息传输,以通过双重信息传播机制动态增强模态共享和模式特定的表示。在RGBT-CC基准上进行的广泛实验证明了我们对RGBT人群计数的框架的有效性。此外,提出的方法对于多模式人群进行了通用,并且还能够在上学上获得卓越的性能。最后,我们的源代码和基准是在{\ url {http://lingboliu.com/rgbt_crowd_counting.html}}上发布的。
Crowd counting is a fundamental yet challenging task, which desires rich information to generate pixel-wise crowd density maps. However, most previous methods only used the limited information of RGB images and cannot well discover potential pedestrians in unconstrained scenarios. In this work, we find that incorporating optical and thermal information can greatly help to recognize pedestrians. To promote future researches in this field, we introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people. Furthermore, to facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework, which consists of multiple modality-specific branches, a modality-shared branch, and an Information Aggregation-Distribution Module (IADM) to capture the complementary information of different modalities fully. Specifically, our IADM incorporates two collaborative information transfers to dynamically enhance the modality-shared and modality-specific representations with a dual information propagation mechanism. Extensive experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting. Moreover, the proposed approach is universal for multimodal crowd counting and is also capable to achieve superior performance on the ShanghaiTechRGBD dataset. Finally, our source code and benchmark are released at {\url{http://lingboliu.com/RGBT_Crowd_Counting.html}}.