ESPNET-SE：端到端的语音增强和分离工具包，用于ASR集成

论文标题

ESPNET-SE：端到端的语音增强和分离工具包，用于ASR集成

ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration

论文作者

Li, Chenda, Shi, Jing, Zhang, Wangyou, Subramanian, Aswin Shanmugam, Chang, Xuankai, Kamo, Naoyuki, Hira, Moto, Hayashi, Tomoki, Boeddeker, Christoph, Chen, Zhuo, Watanabe, Shinji

论文摘要

我们提出ESPNET-SE，该ESPNET-SE旨在在单个框架中快速开发语音增强和语音分离系统以及可选的下游语音识别模块。 ESPNET-SE是一个新项目，它集成了丰富的自动语音识别与模型，资源和系统，以支持和验证拟议的前端实施（即语音增强和分离）。它能够处理单渠道和多通道数据，并具有各种功能，包括固定，剥离，剥离和源分离。我们为广泛的基准数据集提供了多合一的食谱，包括数据预处理，功能提取，培训和评估管道。本文介绍了工具包的设计，几种重要的功能，尤其是语音识别集成，这将ESPNET-SE与其他开源工具包区分开来，以及与主要基准数据集的实验结果。

We present ESPnet-SE, which is designed for the quick development of speech enhancement and speech separation systems in a single framework, along with the optional downstream speech recognition module. ESPnet-SE is a new project which integrates rich automatic speech recognition related models, resources and systems to support and validate the proposed front-end implementation (i.e. speech enhancement and separation).It is capable of processing both single-channel and multi-channel data, with various functionalities including dereverberation, denoising and source separation. We provide all-in-one recipes including data pre-processing, feature extraction, training and evaluation pipelines for a wide range of benchmark datasets. This paper describes the design of the toolkit, several important functionalities, especially the speech recognition integration, which differentiates ESPnet-SE from other open source toolkits, and experimental results with major benchmark datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题