用于未知来源数量的多对话者ASR：源计数，分离和ASR的联合培训

论文标题

用于未知来源数量的多对话者ASR：源计数，分离和ASR的联合培训

Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

论文作者

von Neumann, Thilo, Boeddeker, Christoph, Drude, Lukas, Kinoshita, Keisuke, Delcroix, Marc, Nakatani, Tomohiro, Haeb-Umbach, Reinhold

论文摘要

多数言语的大多数方法重叠的语音分离和识别都假定同时提供了活跃的说话者的数量，但是在现实的情况下，通常是未知的。为了应对这一点，我们扩展了一个具有机制来计算源数量的迭代语音提取系统，并将其与单词器语音识别器结合在一起，以形成第一个端到端的多访问者自动语音识别系统，以实现未知数量的活动扬声器。我们的实验在计算准确性，源分离和语音识别方面表现出非常有希望的性能，从WSJ0-2MIX和WSJ0-3MIX中的模拟干净混合物上进行了识别。除其他外，我们在WSJ0-2MIX数据库上设置了一个新的最新单词错误率。此外，如WSJ0-4MIX数据库所示，我们的系统比在训练过程中所看到的比训练中所看到的要大得多。

Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown. To cope with this, we extend an iterative speech extraction system with mechanisms to count the number of sources and combine it with a single-talker speech recognizer to form the first end-to-end multi-talker automatic speech recognition system for an unknown number of active speakers. Our experiments show very promising performance in counting accuracy, source separation and speech recognition on simulated clean mixtures from WSJ0-2mix and WSJ0-3mix. Among others, we set a new state-of-the-art word error rate on the WSJ0-2mix database. Furthermore, our system generalizes well to a larger number of speakers than it ever saw during training, as shown in experiments with the WSJ0-4mix database.

下载PDF全文

下载文献需遵守相关版权规定

论文标题