用弱标记的数据源分离：计算听觉场景分析的方法

论文标题

用弱标记的数据源分离：计算听觉场景分析的方法

Source separation with weakly labelled data: An approach to computational auditory scene analysis

论文作者

Kong, Qiuqiang, Wang, Yuxuan, Song, Xuchen, Cao, Yin, Wang, Wenwu, Plumbley, Mark D.

论文摘要

源分离是将音频录制分为单个声源的任务。源分离是计算听觉场景分析的基础。以前关于源分离的工作重点是分开特定的声音类别，例如语音和音乐。以前的许多工作都需要混合和干净的源对进行培训。在这项工作中，我们提出了一个用弱标记数据训练的源分离框架。弱标记的数据仅包含音频剪辑的标签，而没有声音事件的发生时间。我们首先使用音频集训练声音事件检测系统。训练有素的声音事件检测系统用于检测主要喜欢包含目标声音事件的段。然后，从两个随机选择的段的混合物中学到了回归，以在目标段的音频标记预测为条件下进行的目标段。我们提出的系统可以将527种声音类别与单个系统中的音频集分开。分离系统采用了U-NET，并在音频集的527个声音类别上达到5.67 dB的平均SDR。

Source separation is the task to separate an audio recording into individual sound sources. Source separation is fundamental for computational auditory scene analysis. Previous work on source separation has focused on separating particular sound classes such as speech and music. Many of previous work require mixture and clean source pairs for training. In this work, we propose a source separation framework trained with weakly labelled data. Weakly labelled data only contains the tags of an audio clip, without the occurrence time of sound events. We first train a sound event detection system with AudioSet. The trained sound event detection system is used to detect segments that are mostly like to contain a target sound event. Then a regression is learnt from a mixture of two randomly selected segments to a target segment conditioned on the audio tagging prediction of the target segment. Our proposed system can separate 527 kinds of sound classes from AudioSet within a single system. A U-Net is adopted for the separation system and achieves an average SDR of 5.67 dB over 527 sound classes in AudioSet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题