有限标记数据的音频表示学习的自我监督图

论文标题

有限标记数据的音频表示学习的自我监督图

Self-supervised Graphs for Audio Representation Learning with Limited Labeled Data

论文作者

Shirian, Amir, Somandepalli, Krishna, Guha, Tanaya

论文摘要

具有高质量手动注释的大型数据库在音频域中很少。因此，我们探索了一种从高度有限的标记数据中学习音频表示的自我监督的图形方法。将每个音频示例视为一个图节点，我们提出了一个基于子图的框架，该框架具有新颖的自学任务，可以学习有效的音频表示。在训练过程中，通过对整个可用的培训数据进行采样以利用标记和未标记的音频样本之间的关系来构建子图。在推论过程中，我们使用随机边缘来减轻图形结构的开销。我们在三个基准音频数据库和两个任务上评估了模型：声学事件检测和语音情绪识别。我们的半监督模型的性能更好或与完全监督的模型相同，并且优于几个竞争现有模型。我们的模型是紧凑的（240K参数），可以产生对不同类型的信号噪声强大的通用音频表示。

Large scale databases with high-quality manual annotations are scarce in audio domain. We thus explore a self-supervised graph approach to learning audio representations from highly limited labelled data. Considering each audio sample as a graph node, we propose a subgraph-based framework with novel self-supervision tasks that can learn effective audio representations. During training, subgraphs are constructed by sampling the entire pool of available training data to exploit the relationship between the labelled and unlabeled audio samples. During inference, we use random edges to alleviate the overhead of graph construction. We evaluate our model on three benchmark audio databases, and two tasks: acoustic event detection and speech emotion recognition. Our semi-supervised model performs better or on par with fully supervised models and outperforms several competitive existing models. Our model is compact (240k parameters), and can produce generalized audio representations that are robust to different types of signal noise.

下载PDF全文

下载文献需遵守相关版权规定

论文标题