G2PM：基于新的开放基准数据集的普通话中文的神经素至phoneme转换包

论文标题

G2PM：基于新的开放基准数据集的普通话中文的神经素至phoneme转换包

g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset

论文作者

Park, Kyubyong, Lee, Seanie

论文摘要

将中文素式转换为音素（G2P）是普通话中文文本到语音（TTS）系统中必不可少的组成部分。中国G2P转换中最大的挑战之一是如何消除多个角色的发音 - 具有多个发音的字符。尽管已经做出了许多学术努力来解决它，但没有开放数据集可以作为迄今为止公平比较的标准基准。此外，大多数报告的系统都很难用于希望在方便方面将中文文本转换为拼音的研究人员或从业人员。在这项工作中，我们引入了一个新的基准数据集，该数据集由99,000多个中国多人歧义的句子组成。我们在其上训练一个简单的神经网络模型，发现它的表现优于其他先前存在的G2P系统。最后，我们打包我们的项目并在PYPI上分享。

Conversion of Chinese graphemes to phonemes (G2P) is an essential component in Mandarin Chinese Text-To-Speech (TTS) systems. One of the biggest challenges in Chinese G2P conversion is how to disambiguate the pronunciation of polyphones - characters having multiple pronunciations. Although many academic efforts have been made to address it, there has been no open dataset that can serve as a standard benchmark for fair comparison to date. In addition, most of the reported systems are hard to employ for researchers or practitioners who want to convert Chinese text into pinyin at their convenience. Motivated by these, in this work, we introduce a new benchmark dataset that consists of 99,000+ sentences for Chinese polyphone disambiguation. We train a simple neural network model on it, and find that it outperforms other preexisting G2P systems. Finally, we package our project and share it on PyPi.

下载PDF全文

下载文献需遵守相关版权规定

论文标题