论文标题
一种发音所有的模型:使用变压器集合的多语言谱系转换
One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble
论文作者
论文摘要
谱系转换(G2P)转换的任务对于语音识别和综合都很重要。与其他语音和语言处理任务类似,在只有小型培训数据的情况下,学习G2P模型具有挑战性。我们描述了一种基于多语言变压器和自我训练的模型合奏的简单方法,以为15种语言开发高效的G2P解决方案。我们的模型是作为我们参与Sigmorphon 2020共享任务1的一部分而开发的。我们的最佳模型达到了14.99个单词错误率(WER)和3.30音素错误率(PER),这是对共享任务竞争基线的可观改进。
The task of grapheme-to-phoneme (G2P) conversion is important for both speech recognition and synthesis. Similar to other speech and language processing tasks, in a scenario where only small-sized training data are available, learning G2P models is challenging. We describe a simple approach of exploiting model ensembles, based on multilingual Transformers and self-training, to develop a highly effective G2P solution for 15 languages. Our models are developed as part of our participation in the SIGMORPHON 2020 Shared Task 1 focused at G2P. Our best models achieve 14.99 word error rate (WER) and 3.30 phoneme error rate (PER), a sizeable improvement over the shared task competitive baselines.