论文标题
利用可学习的联合组进行手部姿势估计
Exploiting Learnable Joint Groups for Hand Pose Estimation
论文作者
论文摘要
在本文中,我们建议通过以群体方式恢复关节的3D坐标来估计3D手姿势,其中与较少相关的关节自动分为不同的组并表现出不同的特征。这与以前所有关节都被整体视为并具有相同功能的方法不同。多任务学习(MTL)的原理(即,通过将与较少相关的关节分为不同的组(如不同的任务))来说明我们方法的好处。我们方法的关键是一种新型的二进制选择器,它会自动选择相关的关节到同一组中。我们实现了从混凝土分布随机采样的二进制值的选择器,该分布是在可训练参数上使用Gumbel SoftMax构建的。这使我们能够保留整个网络的可区分属性。我们通过在其中进行其他功能融合方案来进一步利用这些小组的功能,以了解更多的歧视功能。这是通过在串联特征上实现多个1x1卷积来实现的,其中每个联合组都包含一个独特的1x1卷积用于特征融合。几个基准数据集上的详细消融分析和广泛的实验证明了该方法对最新方法(SOTA)方法的有希望的性能。此外,在提交日期,我们最近发布的Freihand竞赛中未利用密集的3D形状标签的所有方法中,我们的方法还获得了TOP-1。源代码和模型可在https://github.com/ moranli-aca/LearnableGroups手中获得。
In this paper, we propose to estimate 3D hand pose by recovering the 3D coordinates of joints in a group-wise manner, where less-related joints are automatically categorized into different groups and exhibit different features. This is different from the previous methods where all the joints are considered holistically and share the same feature. The benefits of our method are illustrated by the principle of multi-task learning (MTL), i.e., by separating less-related joints into different groups (as different tasks), our method learns different features for each of them, therefore efficiently avoids the negative transfer (among less related tasks/groups of joints). The key of our method is a novel binary selector that automatically selects related joints into the same group. We implement such a selector with binary values stochastically sampled from a Concrete distribution, which is constructed using Gumbel softmax on trainable parameters. This enables us to preserve the differentiable property of the whole network. We further exploit features from those less-related groups by carrying out an additional feature fusing scheme among them, to learn more discriminative features. This is realized by implementing multiple 1x1 convolutions on the concatenated features, where each joint group contains a unique 1x1 convolution for feature fusion. The detailed ablation analysis and the extensive experiments on several benchmark datasets demonstrate the promising performance of the proposed method over the state-of-the-art (SOTA) methods. Besides, our method achieves top-1 among all the methods that do not exploit the dense 3D shape labels on the most recently released FreiHAND competition at the submission date. The source code and models are available at https://github.com/ moranli-aca/LearnableGroups-Hand.