实时重新识别的多目标域适应的知识蒸馏

论文标题

实时重新识别的多目标域适应的知识蒸馏

Knowledge Distillation for Multi-Target Domain Adaptation in Real-Time Person Re-Identification

论文作者

Remigereau, Félix, Mekhazni, Djebril, Abdoli, Sajjad, Nguyen-Meidine, Le Thanh, Cruz, Rafael M. O., Granger, Eric

论文摘要

尽管深度学习架构最近取得了成功，但人重新识别（REID）仍然是一个具有挑战性的问题。最近，已经提出了几种无监督的单目标域适应性（STDA）方法，以限制源和目标视频数据之间通常发生的域移位引起的REID准确性下降。鉴于人REID数据的多模式性质（由于相机观点之间的变化和捕获条件的变化），训练常见的CNN主链来解决跨多个目标域的域移动，可以为实时REID应用程序提供有效的解决方案。尽管在REID文献中尚未广泛解决多目标域的适应性（MTDA），但一种直接的方法在于混合不同的目标数据集，并在混合物上执行STDA以训练公共CNN。但是，这种方法可能导致概括不良，尤其是在融合越来越多的不同目标域来训练较小的CNN时。为了减轻此问题，我们基于知识蒸馏（KD-REID）引入了一种新的MTDA方法，适用于实时人员REID应用。我们的方法通过从多个专业的教师CNN中蒸馏出来，适应了目标域上常见的轻型学生骨干CNN，每个CNN都适用于特定目标域的数据。对几个具有挑战性的人REID数据集进行的广泛实验表明，我们的方法优于MTDA的最先进方法，包括混合方法，尤其是在训练像OSNET这样的紧凑型CNN主链时。结果表明，我们的灵活MTDA方法可用于设计成本效益的REID系统，以实时视频监视应用程序。

Despite the recent success of deep learning architectures, person re-identification (ReID) remains a challenging problem in real-word applications. Several unsupervised single-target domain adaptation (STDA) methods have recently been proposed to limit the decline in ReID accuracy caused by the domain shift that typically occurs between source and target video data. Given the multimodal nature of person ReID data (due to variations across camera viewpoints and capture conditions), training a common CNN backbone to address domain shifts across multiple target domains, can provide an efficient solution for real-time ReID applications. Although multi-target domain adaptation (MTDA) has not been widely addressed in the ReID literature, a straightforward approach consists in blending different target datasets, and performing STDA on the mixture to train a common CNN. However, this approach may lead to poor generalization, especially when blending a growing number of distinct target domains to train a smaller CNN. To alleviate this problem, we introduce a new MTDA method based on knowledge distillation (KD-ReID) that is suitable for real-time person ReID applications. Our method adapts a common lightweight student backbone CNN over the target domains by alternatively distilling from multiple specialized teacher CNNs, each one adapted on data from a specific target domain. Extensive experiments conducted on several challenging person ReID datasets indicate that our approach outperforms state-of-art methods for MTDA, including blending methods, particularly when training a compact CNN backbone like OSNet. Results suggest that our flexible MTDA approach can be employed to design cost-effective ReID systems for real-time video surveillance applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题