用于多语言谱系到词素转换的神经机器翻译

论文标题

用于多语言谱系到词素转换的神经机器翻译

Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion

论文作者

Sokolov, Alex, Rohlin, Tracy, Rastrow, Ariya

论文摘要

素式至phoneme（G2P）模型是自动语音识别（ASR）系统（例如Alexa中的ASR系统）中的关键组件，因为它们用于生成发音单词的发音，这些单词在发音Lexicons中不存在（e e c h o o k to“ e e k o o o））。大多数G2P系统都是单语的，并且基于传统的基于联合序列的N-Gram模型[1,2]。作为替代方案，我们提出了一个端到端训练的神经G2P模型，该模型在多种语言上共享相同的编码器和解码器。这使模型可以利用拉丁字母的通用符号清单和跨语言上共享特征表示形式的组合。这种模型在低资源语言和代码切换/外语的场景中特别有用，其中一种语言的发音需要适应其他语言或口音。我们进一步以单词语言分布向量作为额外的培训目标进行实验，以通过帮助模型在参数空间中的各种语言中发音来提高系统性能。与单语言基线相比，我们显示出比低资源语言的音素错误率平均提高7.2％，而与高资源相比，高资源的错误率没有降解。

Grapheme-to-phoneme (G2P) models are a key component in Automatic Speech Recognition (ASR) systems, such as the ASR system in Alexa, as they are used to generate pronunciations for out-of-vocabulary words that do not exist in the pronunciation lexicons (mappings like "e c h o" to "E k oU"). Most G2P systems are monolingual and based on traditional joint-sequence based n-gram models [1,2]. As an alternative, we present a single end-to-end trained neural G2P model that shares same encoder and decoder across multiple languages. This allows the model to utilize a combination of universal symbol inventories of Latin-like alphabets and cross-linguistically shared feature representations. Such model is especially useful in the scenarios of low resource languages and code switching/foreign words, where the pronunciations in one language need to be adapted to other locales or accents. We further experiment with word language distribution vector as an additional training target in order to improve system performance by helping the model decouple pronunciations across a variety of languages in the parameter space. We show 7.2% average improvement in phoneme error rate over low resource languages and no degradation over high resource ones compared to monolingual baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题