论文标题
MMS-MSG:多功能多扬声器混合物信号发生器
MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator
论文作者
论文摘要
语音增强的范围已从单一的独立任务的整体视图变为复杂的对话语音记录的联合处理。对这些单个任务的培训和评估需要综合数据,并访问了与评估方案尽可能接近的中间信号。由于通常不可用这些数据,因此许多工作使用专用数据库来训练每个系统组件,例如WSJ0-MIX用于源分离。我们提出了一个多用途的多扬声器混合物信号发生器(MMS-MSG),用于基于任何语音语料库生成各种语音混合信号,范围从Reverberant的混合物(例如,从经典的Anechoic Mixtures(例如WSJ0-MIX))(例如,SMS-WSJ)到会议式的数据。它的高度模块化和灵活的结构允许模拟各种环境和动态混合,同时实现了简单的扩展和修改,以生成新的场景和混合物类型。这些会议可用于原型,评估或培训目的。我们为基于WSJ语料库的会议提供了示例评估数据和基线结果。此外,我们通过使用MMS-MSG为图书馆数据库提供培训数据来证明对现实情况的有用性。
The scope of speech enhancement has changed from a monolithic view of single, independent tasks, to a joint processing of complex conversational speech recordings. Training and evaluation of these single tasks requires synthetic data with access to intermediate signals that is as close as possible to the evaluation scenario. As such data often is not available, many works instead use specialized databases for the training of each system component, e.g WSJ0-mix for source separation. We present a Multi-purpose Multi-Speaker Mixture Signal Generator (MMS-MSG) for generating a variety of speech mixture signals based on any speech corpus, ranging from classical anechoic mixtures (e.g., WSJ0-mix) over reverberant mixtures (e.g., SMS-WSJ) to meeting-style data. Its highly modular and flexible structure allows for the simulation of diverse environments and dynamic mixing, while simultaneously enabling an easy extension and modification to generate new scenarios and mixture types. These meetings can be used for prototyping, evaluation, or training purposes. We provide example evaluation data and baseline results for meetings based on the WSJ corpus. Further, we demonstrate the usefulness for realistic scenarios by using MMS-MSG to provide training data for the LibriCSS database.