统一的普通话TTS前端基于蒸馏的BERT模型

论文标题

统一的普通话TTS前端基于蒸馏的BERT模型

Unified Mandarin TTS Front-end Based on Distilled BERT Model

论文作者

Zhang, Yang, Deng, Liqun, Wang, Yasheng

论文摘要

典型的普通话文本到语音系统（TTS）中的前端模块由文本处理组件的长管线组成，这需要广泛的努力来构建，并且容易容易出现大型累积模型大小和级联错误。在本文中，提出了一个基于预训练的语言模型（PLM）模型，以同时解决TTS前端中最重要的两个任务，即韵律结构预测（PSP）和Grupheme-to-Phoneme（G2P）转换。我们使用预先训练的中国BERT [1]作为文本编码器，并采用多任务学习技术来使其适应两个TTS前端任务。然后，通过采用称为Tinybert [2]的知识蒸馏技术，将Bert编码器蒸馏成较小的模型，使整个模型大小的基准管道模型的25％，同时在这两个任务上保持竞争性能。通过提出的方法，我们能够以光线和统一的方式运行整个TTS前端模块，这对在移动设备上的部署更为友好。

The front-end module in a typical Mandarin text-to-speech system (TTS) is composed of a long pipeline of text processing components, which requires extensive efforts to build and is prone to large accumulative model size and cascade errors. In this paper, a pre-trained language model (PLM) based model is proposed to simultaneously tackle the two most important tasks in TTS front-end, i.e., prosodic structure prediction (PSP) and grapheme-to-phoneme (G2P) conversion. We use a pre-trained Chinese BERT[1] as the text encoder and employ multi-task learning technique to adapt it to the two TTS front-end tasks. Then, the BERT encoder is distilled into a smaller model by employing a knowledge distillation technique called TinyBERT[2], making the whole model size 25% of that of benchmark pipeline models while maintaining competitive performance on both tasks. With the proposed the methods, we are able to run the whole TTS front-end module in a light and unified manner, which is more friendly to deployment on mobile devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题