论文标题
SRNET:通过分裂和重组方法改善3D人姿势估计的概括
SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach
论文作者
论文摘要
在训练集中罕见或看不见的人类姿势对于网络来说是具有挑战性的。类似于视觉识别中长尾分布问题,此类姿势的示例少数限制了网络对它们进行建模的能力。有趣的是,本地姿势分布遭受的长尾问题的影响较小,即,罕见姿势内的本地联合配置可能会出现在训练集中的其他姿势中,从而使它们少见。我们建议利用这一事实,以更好地概括稀有和看不见的姿势。具体来说,我们的方法将人体分为本地区域,并在单独的网络分支中对其进行处理,利用关节位置主要取决于其局部身体区域内的关节。通过将全球环境从身体的其余部分重新组合为每个分支作为低维矢量,可以保持全球连贯性。随着较小相关身体区域的维度降低,网络分支中的训练集分布更加紧密地反映了局部姿势的统计数据,而不是全球身体姿势,而无需牺牲对关节推断重要的信息。所提出的称为SRNET的拆分和重组方法可以很容易地适应单像模型和时间模型,并且可以在预测稀有和看不见的姿势的预测方面有明显的改善。
Human poses that are rare or unseen in a training set are challenging for a network to predict. Similar to the long-tailed distribution problem in visual recognition, the small number of examples for such poses limits the ability of networks to model them. Interestingly, local pose distributions suffer less from the long-tail problem, i.e., local joint configurations within a rare pose may appear within other poses in the training set, making them less rare. We propose to take advantage of this fact for better generalization to rare and unseen poses. To be specific, our method splits the body into local regions and processes them in separate network branches, utilizing the property that a joint position depends mainly on the joints within its local body region. Global coherence is maintained by recombining the global context from the rest of the body into each branch as a low-dimensional vector. With the reduced dimensionality of less relevant body areas, the training set distribution within network branches more closely reflects the statistics of local poses instead of global body poses, without sacrificing information important for joint inference. The proposed split-and-recombine approach, called SRNet, can be easily adapted to both single-image and temporal models, and it leads to appreciable improvements in the prediction of rare and unseen poses.