自动数据增强选择和参数化在对比度自我监督语音表示学习中

论文标题

自动数据增强选择和参数化在对比度自我监督语音表示学习中

Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning

论文作者

Zaiem, Salah, Parcollet, Titouan, Essid, Slim

论文摘要

对比学习使学习有用的音频和语音表示，而无需地面真相标签，通过最大化相似信号段的潜在表示之间的相似性。在此框架中，通常利用各种数据增强技术来帮助在学习的表示形式中执行所需的不变，从而提高了各种音频任务的性能。现在，选择最相关的增强已被证明对更好的下游表演至关重要。因此，这项工作引入了一种有条件的基于独立的方法，该方法允许自动从一组预定义的预定义的预定的预定的预定的预训练的预训练中自动选择适当的分布及其参数化。这是关于感兴趣的下游任务执行的，从而节省了昂贵的超参数搜索。在两个不同的下游任务上进行的实验验证了所提出的方法，表现出比在没有增强或基线增强的情况下进行实验更好的结果。此外，我们根据所考虑的最终下游数据集对自动选择的增强及其变化进行定性分析。

Contrastive learning enables learning useful audio and speech representations without ground-truth labels by maximizing the similarity between latent representations of similar signal segments. In this framework various data augmentation techniques are usually exploited to help enforce desired invariances within the learned representations, improving performance on various audio tasks thanks to more robust embeddings. Now, selecting the most relevant augmentations has proven crucial for better downstream performances. Thus, this work introduces a conditional independance-based method which allows for automatically selecting a suitable distribution on the choice of augmentations and their parametrization from a set of predefined ones, for contrastive self-supervised pre-training. This is performed with respect to a downstream task of interest, hence saving a costly hyper-parameter search. Experiments performed on two different downstream tasks validate the proposed approach showing better results than experimenting without augmentation or with baseline augmentations. We furthermore conduct a qualitative analysis of the automatically selected augmentations and their variation according to the considered final downstream dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题