Fairseq S2T：Fairseq的快速语音到文本建模

论文标题

Fairseq S2T：Fairseq的快速语音到文本建模

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

论文作者

Wang, Changhan, Tang, Yun, Ma, Xutai, Wu, Anne, Popuri, Sravya, Okhonko, Dmytro, Pino, Juan

论文摘要

我们介绍了Fairseq S2T，这是语音到文本（S2T）建模任务的Fairseq扩展，例如端到端语音识别和语音到文本翻译。它遵循Fairseq的仔细设计，以实现可扩展性和可扩展性。我们提供从数据预处理，模型培训到离线推理的端到端工作流程。我们实施了基于最新的RNN，基于变压器以及基于构象体的模型和开源详细培训配方。 Fairseq的机器翻译模型和语言模型可以无缝集成到S2T工作流中，以进行多任务学习或转移学习。 Fairseq S2T文档和示例可在https://github.com/pytorch/fairseq/tree/master/master/examples/speech_to_text上找到。

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based, Transformer-based as well as Conformer-based models and open-source detailed training recipes. Fairseq's machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T documentation and examples are available at https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text.

下载PDF全文

下载文献需遵守相关版权规定

论文标题