论文标题

统一的普通话TTS前端基于蒸馏的BERT模型

Unified Mandarin TTS Front-end Based on Distilled BERT Model

论文作者

Zhang, Yang, Deng, Liqun, Wang, Yasheng

论文摘要

典型的普通话文本到语音系统(TTS)中的前端模块由文本处理组件的长管线组成,这需要广泛的努力来构建,并且容易容易出现大型累积模型大小和级联错误。在本文中,提出了一个基于预训练的语言模型(PLM)模型,以同时解决TTS前端中最重要的两个任务,即韵律结构预测(PSP)和Grupheme-to-Phoneme(G2P)转换。我们使用预先训练的中国BERT [1]作为文本编码器,并采用多任务学习技术来使其适应两个TTS前端任务。然后,通过采用称为Tinybert [2]的知识蒸馏技术,将Bert编码器蒸馏成较小的模型,使整个模型大小的基准管道模型的25%,同时在这两个任务上保持竞争性能。通过提出的方法,我们能够以光线和统一的方式运行整个TTS前端模块,这对在移动设备上的部署更为友好。

The front-end module in a typical Mandarin text-to-speech system (TTS) is composed of a long pipeline of text processing components, which requires extensive efforts to build and is prone to large accumulative model size and cascade errors. In this paper, a pre-trained language model (PLM) based model is proposed to simultaneously tackle the two most important tasks in TTS front-end, i.e., prosodic structure prediction (PSP) and grapheme-to-phoneme (G2P) conversion. We use a pre-trained Chinese BERT[1] as the text encoder and employ multi-task learning technique to adapt it to the two TTS front-end tasks. Then, the BERT encoder is distilled into a smaller model by employing a knowledge distillation technique called TinyBERT[2], making the whole model size 25% of that of benchmark pipeline models while maintaining competitive performance on both tasks. With the proposed the methods, we are able to run the whole TTS front-end module in a light and unified manner, which is more friendly to deployment on mobile devices.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源