赢得匹配：分析序列长度，以进行有效的语音和音频学习

论文标题

赢得匹配：分析序列长度，以进行有效的语音和音频学习

Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio

论文作者

Gao, Yan, Fernandez-Marques, Javier, Parcollet, Titouan, de Gusmao, Pedro P. B., Lane, Nicholas D.

论文摘要

事实证明，自我监督学习（SSL）在语音和音频相关的应用中至关重要。该范式训练无标记数据的通用模型，后来可以用于解决特定的下游任务。这种类型的型号的训练成本很高，因为它需要操纵只能由功能强大的集中式服务器处理的长输入序列。令人惊讶的是，尽管通过模型压缩进行了许多尝试提高训练效率的尝试，但尚未研究截断输入序列长度减少计算的影响。在本文中，我们为不同指定序列长度提供了SSL预训练的首次实证研究，并将其与各种下游任务联系起来。我们发现，短序列的培训可以大大降低资源成本，同时保留所有任务的令人满意的性能。这种简单的单行更改将促进SSL培训从数据中心迁移到用户端边缘设备，以实现更现实和个性化的应用程序。

Self-supervised learning (SSL) has proven vital in speech and audio-related applications. The paradigm trains a general model on unlabeled data that can later be used to solve specific downstream tasks. This type of model is costly to train as it requires manipulating long input sequences that can only be handled by powerful centralised servers. Surprisingly, despite many attempts to increase training efficiency through model compression, the effects of truncating input sequence lengths to reduce computation have not been studied. In this paper, we provide the first empirical study of SSL pre-training for different specified sequence lengths and link this to various downstream tasks. We find that training on short sequences can dramatically reduce resource costs while retaining a satisfactory performance for all tasks. This simple one-line change would promote the migration of SSL training from data centres to user-end edge devices for more realistic and personalised applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题