基于卷积复发性神经网络，DCASE挑战的半监督损失功能2020任务4

论文标题

基于卷积复发性神经网络，DCASE挑战的半监督损失功能2020任务4

Polyphonic sound event detection based on convolutional recurrent neural networks with semi-supervised loss function for DCASE challenge 2020 task 4

论文作者

Kim, Nam Kyun, Kim, Hong Kook

论文摘要

本报告提出了针对DCASE 2020挑战任务4的多相声音事件检测（SED）方法4。拟议的SED方法基于半监督的学习，以处理培训数据集的不同组合，例如弱标记的数据集，未标记的数据集，并具有强烈标记的合成数据集。尤其是，首先使用平均教师模型（即Dcase 2020基线）来预测每个音频剪辑的目标标签。带有预测标签的数据用于训练所提出的SED模型，该模型由具有跳过连接和自我注意力的机制的CNN组成，其次是RNN。为了补偿弱标记和未标记数据的错误预测，用于提议的SED模型使用半监督损失函数。在这项工作中，根据半监督损耗函数的不同参数设置对验证集进行了几个版本，并在验证集上进行了评估，然后最终选择了组合五倍验证模型的集成模型。

This report proposes a polyphonic sound event detection (SED) method for the DCASE 2020 Challenge Task 4. The proposed SED method is based on semi-supervised learning to deal with the different combination of training datasets such as weakly labeled dataset, unlabeled dataset, and strongly labeled synthetic dataset. Especially, the target label of each audio clip from weakly labeled or unlabeled dataset is first predicted by using the mean teacher model that is the DCASE 2020 baseline. The data with predicted labels are used for training the proposed SED model, which consists of CNNs with skip connections and self-attention mechanism, followed by RNNs. In order to compensate for the erroneous prediction of weakly labeled and unlabeled data, a semi-supervised loss function is employed for the proposed SED model. In this work, several versions of the proposed SED model are implemented and evaluated on the validation set according to the different parameter setting for the semi-supervised loss function, and then an ensemble model that combines five-fold validation models is finally selected as our final model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题