多语言语音翻译，并有效地对经过预告

论文标题

多语言语音翻译，并有效地对经过预告

Multilingual Speech Translation with Efficient Finetuning of Pretrained Models

论文作者

Li, Xian, Wang, Changhan, Tang, Yun, Tran, Chau, Tang, Yuqing, Pino, Juan, Baevski, Alexei, Conneau, Alexis, Auli, Michael

论文摘要

我们提出了一种简单而有效的方法，可以通过从验证的语音编码器和文本解码器中进行有效的转移学习来构建多语言语音到文本（ST）翻译。我们的主要发现是，仅通过仅列出较少的预期参数的10％，可以实现零拍的跨语言和交叉模式转移能力。这可以有效利用低训练成本的大型审计模型。 Using wav2vec 2.0 for acoustic modeling, and mBART for multilingual text generation, our approach advanced the new state-of-the-art for 34 translation directions (and surpassing cascaded ST for 23 of them) on large-scale multilingual ST benchmark CoVoST 2 (+6.4 BLEU on average across 15 En-X directions and +5.1 BLEU on average across 19 X-En directions).我们的方法表明，在多到多语言模型（平均在18个非英语方向上+5.7 BLEU）中表现出强烈的零射击性能，这是一种具有提高参数和数据效率的高质量语音翻译的吸引人的方法。

We present a simple yet effective approach to build multilingual speech-to-text (ST) translation by efficient transfer learning from pretrained speech encoder and text decoder. Our key finding is that a minimalistic LNA (LayerNorm and Attention) finetuning can achieve zero-shot crosslingual and cross-modality transfer ability by only finetuning less than 10% of the pretrained parameters. This enables effectively leveraging large pretrained models with low training cost. Using wav2vec 2.0 for acoustic modeling, and mBART for multilingual text generation, our approach advanced the new state-of-the-art for 34 translation directions (and surpassing cascaded ST for 23 of them) on large-scale multilingual ST benchmark CoVoST 2 (+6.4 BLEU on average across 15 En-X directions and +5.1 BLEU on average across 19 X-En directions). Our approach demonstrates strong zero-shot performance in a many-to-many multilingual model (+5.7 BLEU on average across 18 non-English directions), making it an appealing approach for attaining high-quality speech translation with improved parameter and data efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题