Muskits：用于唱歌语音综合的端到端音乐处理工具包

论文标题

Muskits：用于唱歌语音综合的端到端音乐处理工具包

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis

论文作者

Shi, Jiatong, Guo, Shuai, Qian, Tao, Huo, Nan, Hayashi, Tomoki, Wu, Yuning, Xu, Frank, Chang, Xuankai, Li, Huazhe, Wu, Peter, Watanabe, Shinji, Jin, Qin

论文摘要

本文介绍了一个名为Muskits的新开源平台，用于端到端的音乐处理，该平台主要集中于端到端的唱歌语音合成（E2E-SVS）。 Muskits支持最先进的SVS模型，包括RNN SVS，Transformer SVS和小米。步枪的设计遵循广泛使用的语音处理工具包ESPNET和KALDI的风格，用于数据处理，培训和食谱管道。据我们所知，该工具包是第一个平台，该平台允许在SVS中的几部已发表的作品之间进行公平且高度可复制的比较。此外，我们还根据工具包功能（包括多语言培训和转移学习）演示了几种高级用法。本文描述了单歌手，多语言，多语言和转移学习方案中的马斯基特，其功能和实验结果的主要框架。该工具包可在https://github.com/sjtmusicteam/muskits上公开获取。

This paper introduces a new open-source platform named Muskits for end-to-end music processing, which mainly focuses on end-to-end singing voice synthesis (E2E-SVS). Muskits supports state-of-the-art SVS models, including RNN SVS, transformer SVS, and XiaoiceSing. The design of Muskits follows the style of widely-used speech processing toolkits, ESPnet and Kaldi, for data prepossessing, training, and recipe pipelines. To the best of our knowledge, this toolkit is the first platform that allows a fair and highly-reproducible comparison between several published works in SVS. In addition, we also demonstrate several advanced usages based on the toolkit functionalities, including multilingual training and transfer learning. This paper describes the major framework of Muskits, its functionalities, and experimental results in single-singer, multi-singer, multilingual, and transfer learning scenarios. The toolkit is publicly available at https://github.com/SJTMusicTeam/Muskits.

下载PDF全文

下载文献需遵守相关版权规定

论文标题