使用小数据的婴儿姿势估计的不变表示学习

论文标题

使用小数据的婴儿姿势估计的不变表示学习

Invariant Representation Learning for Infant Pose Estimation with Small Data

论文作者

Huang, Xiaofei, Fu, Nihang, Liu, Shuangjun, Ostadabbas, Sarah

论文摘要

婴儿运动分析在幼儿发展研究中至关重要。但是，尽管人类姿势估计的应用越来越广泛，但在大规模成人姿势数据集中训练的模型几乎没有成功地估计婴儿姿势，因为其体制的显着差异和姿势的多功能性。此外，隐私和安全考虑因素阻碍了从头开始训练可靠模型所需的足够婴儿姿势数据。 To address this problem, this paper presents (1) building and publicly releasing a hybrid synthetic and real infant pose (SyRIP) dataset with small yet diverse real infant images as well as generated synthetic infant poses and (2) a multi-stage invariant representation learning strategy that could transfer the knowledge from the adjacent domains of adult poses and synthetic infant images into our fine-tuned domain-adapted infant pose （FIDIP）估计模型。在我们的消融研究中，具有相同的网络结构，在Syrip数据集中训练的模型对唯一其他公共婴儿姿势数据集的模型进行了显着改善。 FIDIP与具有不同复杂性的姿势估计骨干网络集成，其性能始终比这些模型的微调版本更好。我们最好的婴儿姿势估计表演者之一在最先进的暗处模型上表现出平均平均精度（MAP）为93.6。

Infant motion analysis is a topic with critical importance in early childhood development studies. However, while the applications of human pose estimation have become more and more broad, models trained on large-scale adult pose datasets are barely successful in estimating infant poses due to the significant differences in their body ratio and the versatility of their poses. Moreover, the privacy and security considerations hinder the availability of adequate infant pose data required for training of a robust model from scratch. To address this problem, this paper presents (1) building and publicly releasing a hybrid synthetic and real infant pose (SyRIP) dataset with small yet diverse real infant images as well as generated synthetic infant poses and (2) a multi-stage invariant representation learning strategy that could transfer the knowledge from the adjacent domains of adult poses and synthetic infant images into our fine-tuned domain-adapted infant pose (FiDIP) estimation model. In our ablation study, with identical network structure, models trained on SyRIP dataset show noticeable improvement over the ones trained on the only other public infant pose datasets. Integrated with pose estimation backbone networks with varying complexity, FiDIP performs consistently better than the fine-tuned versions of those models. One of our best infant pose estimation performers on the state-of-the-art DarkPose model shows mean average precision (mAP) of 93.6.

下载PDF全文

下载文献需遵守相关版权规定

论文标题