论文标题
基于浅的特征密集的注意网络,用于人群计数
Shallow Feature Based Dense Attention Network for Crowd Counting
论文作者
论文摘要
尽管近年来,通过深度学习进行人群计数的表现得到了巨大改善,但由于背景混乱和图像中的人们的变化,这仍然是一个根深蒂固的问题。在本文中,我们提出了一个基于浅色特征的密集注意力网络(SDANET),用于从静止图像计数的人群,这通过涉及基于浅的特征注意模型来减少背景的影响,同时,通过密集连接层次结构图像来捕获多尺度信息。具体而言,受到观察的启发,即背景和人群在浅色特征中通常具有明显不同的响应,我们决定在浅层图上建立我们的注意模型,这导致了准确的背景像素检测。此外,考虑到人们在不同尺度上的最具代表性的特征可以出现在特征提取网络的不同层中,以更好地保持它们,因此我们建议密集地连接不同层的分层图像特征,然后将其编码以估算人群密度。三个基准数据集的实验结果清楚地证明了Sdanet在处理不同方案时的优势。特别是,在具有挑战性的UCF CC 50数据集上,我们的方法以大幅度的利润率优于其他现有方法,这可以从我们的Sdanet的显着11.9%平均绝对错误(MAE)下降。
While the performance of crowd counting via deep learning has been improved dramatically in the recent years, it remains an ingrained problem due to cluttered backgrounds and varying scales of people within an image. In this paper, we propose a Shallow feature based Dense Attention Network (SDANet) for crowd counting from still images, which diminishes the impact of backgrounds via involving a shallow feature based attention model, and meanwhile, captures multi-scale information via densely connecting hierarchical image features. Specifically, inspired by the observation that backgrounds and human crowds generally have noticeably different responses in shallow features, we decide to build our attention model upon shallow-feature maps, which results in accurate background-pixel detection. Moreover, considering that the most representative features of people across different scales can appear in different layers of a feature extraction network, to better keep them all, we propose to densely connect hierarchical image features of different layers and subsequently encode them for estimating crowd density. Experimental results on three benchmark datasets clearly demonstrate the superiority of SDANet when dealing with different scenarios. Particularly, on the challenging UCF CC 50 dataset, our method outperforms other existing methods by a large margin, as is evident from a remarkable 11.9% Mean Absolute Error (MAE) drop of our SDANet.