渐进的跨模式知识蒸馏人类行动识别

论文标题

渐进的跨模式知识蒸馏人类行动识别

Progressive Cross-modal Knowledge Distillation for Human Action Recognition

论文作者

Ni, Jianyuan, Ngu, Anne H. H., Yan, Yan

论文摘要

基于可穿戴传感器的人类动作识别（HAR）最近取得了杰出的成功。但是，基于可穿戴传感器的HAR的准确性仍然远远落后于基于视觉方式的系统（即RGB视频，骨架和深度）的精度。多样化的输入方式可以提供互补的提示，从而提高HAR的准确性，但是如何利用基于可穿戴传感器的HAR的多模式数据很少探索。当前，可穿戴设备（即智能手表）只能捕获有限的非视觉模态数据。这阻碍了多模式HAR关联，因为它无法同时使用视觉和非视态模态数据。另一个主要挑战在于如何在有限的计算资源上有效利用可穿戴设备上的多模式数据。在这项工作中，我们提出了一种新型的渐进骨骼到传感器知识蒸馏（PSKD）模型，该模型仅利用时间序列数据，即加速度计数据，从智能手表来解决基于可穿戴传感器的HAR问题。具体而言，我们使用来自教师（人类骨架序列）和学生（时间序列加速度计数据）的数据构建多个教师模型。此外，我们提出了一个有效的渐进学习计划，以消除教师和学生模型之间的绩效差距。我们还设计了一种称为自适应信心语义（ACS）的新型损失功能，以使学生模型可以适应地选择其中一种教师模型或所需的模拟地面真实标签。为了证明我们提出的PSKD方法的有效性，我们对伯克利 - 姆哈德，UTD-MHAD和MMACT数据集进行了广泛的实验。结果证实，与以前的基于单传感器的HAR方法相比，提出的PSKD方法具有竞争性能。

Wearable sensor-based Human Action Recognition (HAR) has achieved remarkable success recently. However, the accuracy performance of wearable sensor-based HAR is still far behind the ones from the visual modalities-based system (i.e., RGB video, skeleton, and depth). Diverse input modalities can provide complementary cues and thus improve the accuracy performance of HAR, but how to take advantage of multi-modal data on wearable sensor-based HAR has rarely been explored. Currently, wearable devices, i.e., smartwatches, can only capture limited kinds of non-visual modality data. This hinders the multi-modal HAR association as it is unable to simultaneously use both visual and non-visual modality data. Another major challenge lies in how to efficiently utilize multimodal data on wearable devices with their limited computation resources. In this work, we propose a novel Progressive Skeleton-to-sensor Knowledge Distillation (PSKD) model which utilizes only time-series data, i.e., accelerometer data, from a smartwatch for solving the wearable sensor-based HAR problem. Specifically, we construct multiple teacher models using data from both teacher (human skeleton sequence) and student (time-series accelerometer data) modalities. In addition, we propose an effective progressive learning scheme to eliminate the performance gap between teacher and student models. We also designed a novel loss function called Adaptive-Confidence Semantic (ACS), to allow the student model to adaptively select either one of the teacher models or the ground-truth label it needs to mimic. To demonstrate the effectiveness of our proposed PSKD method, we conduct extensive experiments on Berkeley-MHAD, UTD-MHAD, and MMAct datasets. The results confirm that the proposed PSKD method has competitive performance compared to the previous mono sensor-based HAR methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题