连续的语音分离与构象异构体

论文标题

连续的语音分离与构象异构体

Continuous Speech Separation with Conformer

论文作者

Chen, Sanyuan, Wu, Yu, Chen, Zhuo, Wu, Jian, Li, Jinyu, Yoshioka, Takuya, Wang, Chengyi, Liu, Shujie, Zhou, Ming

论文摘要

连续的语音分离在复杂的语音相关任务（例如对话转录）中起着至关重要的作用。分离模型从混合语音中提取单个扬声器信号。在本文中，我们使用变压器和符合物代替了分离系统中的复发神经网络，因为我们认为，以基于自我指导的方法捕获全球信息对于语音分离至关重要。在评估图书馆数据集上，构象异构体模型可实现最新结果，在持续评估中，相对23.5％的单词错误率（WER）降低了双向LSTM（BLSTM），而在话语评估中相对降低了，在话语评估方面的相对单词错误率（BLSTM）降低，而持续评估的相对单词错误率（WER）降低了。

Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription. The separation model extracts a single speaker signal from a mixed speech. In this paper, we use transformer and conformer in lieu of recurrent neural networks in the separation system, as we believe capturing global information with the self-attention based method is crucial for the speech separation. Evaluating on the LibriCSS dataset, the conformer separation model achieves state of the art results, with a relative 23.5% word error rate (WER) reduction from bi-directional LSTM (BLSTM) in the utterance-wise evaluation and a 15.4% WER reduction in the continuous evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题