家庭声音事件事件通过转移一致性均值老师培训和对抗领域的适应

论文标题

家庭声音事件事件通过转移一致性均值老师培训和对抗领域的适应

Domestic sound event detection by shift consistency mean-teacher training and adversarial domain adaptation

论文作者

Chen, Fang-Ching, Chen, Kuan-Dar, Liu, Yi-Wen

论文摘要

由于大量未标记的数据的可用性，并且相对容易产生合成的强烈标记数据，因此在国内声音事件检测领域中，半监督的学习和领域适应技术在国内声音事件检测领域引起了人们的注意。在先前的工作中，设计了几种半监督的学习策略，以提高卑鄙的老师模型的性能。即，这些策略包括转移一致性训练（SCT），插值一致性培训（ICT）和伪标记。但是，当我们试图弥补合成数据和真实数据之间的域间隙时，对抗域的适应性（ADA）似乎并没有进一步提高事件检测准确性。在这项研究中，我们从经验上发现，ICT倾向于分解T-SNE图中合成和真实数据的分布。因此，相比之下，SCT将ICT放弃，以培训学生和教师模型。通过这些修改，该系统成功地与ADA网络集成在一起，我们在DCASE 2020 Task 4数据集的F1分数中获得了47.2％，该数据集比以前的工作中报道的高2.1％。

Semi-supervised learning and domain adaptation techniques have drawn increasing attention in the field of domestic sound event detection thanks to the availability of large amounts of unlabeled data and the relative ease to generate synthetic strongly-labeled data. In a previous work, several semi-supervised learning strategies were designed to boost the performance of a mean-teacher model. Namely, these strategies include shift consistency training (SCT), interpolation consistency training (ICT), and pseudo-labeling. However, adversarial domain adaptation (ADA) did not seem to improve the event detection accuracy further when we attempt to compensate for the domain gap between synthetic and real data. In this research, we empirically found that ICT tends to pull apart the distributions of synthetic and real data in t-SNE plots. Therefore, ICT is abandoned while SCT, in contrast, is applied to train both the student and the teacher models. With these modifications, the system successfully integrates with an ADA network, and we achieve 47.2% in the F1 score on the DCASE 2020 task 4 dataset, which is 2.1% higher than what was reported in the previous work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题