论文标题
利用多列建筑的潜力进行人群计数
Exploit the potential of Multi-column architecture for Crowd Counting
论文作者
论文摘要
由于严重的阻塞,复杂的背景和大规模变化等,人群计数是计算机视觉中的一项重要但具有挑战性的任务。多列建筑被广泛采用以克服这些挑战,从而在许多公共基准中产生了最先进的绩效。但是,在这样的设计中仍然存在两个问题:比例限制和特征相似性。因此,进一步的绩效得到了限制。在本文中,我们提出了一个新颖的人群计数框架,称为金字塔量表网络(PSNET),以明确解决这些问题。具体而言,为了限制规模限制,我们采用三个金字塔尺度模块(PSM)来有效捕获多尺度功能,从而将消息传递机制和注意机制集成到多列体系结构中。此外,对于特征相似性,引入了一个新颖的损失函数,以使PSM中每一列的特征相互适当地不同。据我们所知,PSNET是第一项明确解决尺度限制并在多柱设计中具有相似性的工作。在五个基准数据集上进行的广泛实验证明了拟议的创新的有效性以及优于最先进的效果。我们的代码可公开可用:https://github.com/oahunc/pyramid_scale_network
Crowd counting is an important yet challenging task in computer vision due to serious occlusions, complex background and large scale variations, etc. Multi-column architecture is widely adopted to overcome these challenges, yielding state-of-the-art performance in many public benchmarks. However, there still are two issues in such design: scale limitation and feature similarity. Further performance improvements are thus restricted. In this paper, we propose a novel crowd counting framework called Pyramid Scale Network (PSNet) to explicitly address these issues. Specifically, for scale limitation, we adopt three Pyramid Scale Modules (PSM) to efficiently capture multi-scale features, which integrate a message passing mechanism and an attention mechanism into multi-column architecture. Moreover, for feature similarity, a novel loss function named Multi-column variance loss is introduced to make the features learned by each column in PSM appropriately different from each other. To the best of our knowledge, PSNet is the first work to explicitly address scale limitation and feature similarity in multi-column design. Extensive experiments on five benchmark datasets demonstrate the effectiveness of the proposed innovations as well as the superior performance over the state-of-the-art. Our code is publicly available at: https://github.com/oahunc/Pyramid_Scale_Network