$ p^2 $ net：增强的平行perlamid网络，用于注意力指导姿势估计

论文标题

$ p^2 $ net：增强的平行perlamid网络，用于注意力指导姿势估计

$P^2$ Net: Augmented Parallel-Pyramid Net for Attention Guided Pose Estimation

论文作者

Hou, Luanxuan, Cao, Jie, Zhao, Yuan, Shen, Haifeng, Tang, Jian, He, Ran

论文摘要

我们提出了一个增强的平行 - 金字塔网（$ p^2〜NET $），并通过扩张的瓶颈和注意力模块进行特征。在数据预处理期间，我们提出了一种可区分的自动数据增强（$ da^2 $）方法。我们以可区分的形式制定了搜索数据增强策略的问题，以便在培训期间可以通过反向传播轻松更新最佳策略设置。 $ da^2 $提高了培训效率。遵循平行的锥体结构，以补偿网络引入的信息损失。我们创新了两个融合结构，即平行融合和渐进式融合，以处理骨干网络的金字塔特征。这两种融合结构都利用了高分辨率和语义理解的空间信息富裕有效分辨率的优势。我们为金字塔功能提出了一个改进阶段，以进一步提高网络的准确性。通过引入扩张的瓶颈和注意模块，我们增加了具有有限复杂性的功能的接受场，并调整了对不同特征通道的重要性。为了进一步完善特征提取阶段完成后的特征图，定义了注意模块（$ am $），以从平行 - pyramid结构生成的不同比例尺特征图中提取加权特征。与传统的上采样精炼相比，$ AM $可以更好地捕获频道之间的关系。实验证实了我们提出的方法的有效性。值得注意的是，我们的方法在具有挑战性的MSCOCO和MPII数据集上实现了最佳性能。

We propose an augmented Parallel-Pyramid Net ($P^2~Net$) with feature refinement by dilated bottleneck and attention module. During data preprocessing, we proposed a differentiable auto data augmentation ($DA^2$) method. We formulate the problem of searching data augmentaion policy in a differentiable form, so that the optimal policy setting can be easily updated by back propagation during training. $DA^2$ improves the training efficiency. A parallel-pyramid structure is followed to compensate the information loss introduced by the network. We innovate two fusion structures, i.e. Parallel Fusion and Progressive Fusion, to process pyramid features from backbone network. Both fusion structures leverage the advantages of spatial information affluence at high resolution and semantic comprehension at low resolution effectively. We propose a refinement stage for the pyramid features to further boost the accuracy of our network. By introducing dilated bottleneck and attention module, we increase the receptive field for the features with limited complexity and tune the importance to different feature channels. To further refine the feature maps after completion of feature extraction stage, an Attention Module ($AM$) is defined to extract weighted features from different scale feature maps generated by the parallel-pyramid structure. Compared with the traditional up-sampling refining, $AM$ can better capture the relationship between channels. Experiments corroborate the effectiveness of our proposed method. Notably, our method achieves the best performance on the challenging MSCOCO and MPII datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题