论文标题

连续的语音分离与构象异构体

Continuous Speech Separation with Conformer

论文作者

Chen, Sanyuan, Wu, Yu, Chen, Zhuo, Wu, Jian, Li, Jinyu, Yoshioka, Takuya, Wang, Chengyi, Liu, Shujie, Zhou, Ming

论文摘要

连续的语音分离在复杂的语音相关任务(例如对话转录)中起着至关重要的作用。分离模型从混合语音中提取单个扬声器信号。在本文中,我们使用变压器和符合物代替了分离系统中的复发神经网络,因为我们认为,以基于自我指导的方法捕获全球信息对于语音分离至关重要。在评估图书馆数据集上,构象异构体模型可实现最新结果,在持续评估中,相对23.5%的单词错误率(WER)降低了双向LSTM(BLSTM),而在话语评估中相对降低了,在话语评估方面的相对单词错误率(BLSTM)降低,而持续评估的相对单词错误率(WER)降低了。

Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription. The separation model extracts a single speaker signal from a mixed speech. In this paper, we use transformer and conformer in lieu of recurrent neural networks in the separation system, as we believe capturing global information with the self-attention based method is crucial for the speech separation. Evaluating on the LibriCSS dataset, the conformer separation model achieves state of the art results, with a relative 23.5% word error rate (WER) reduction from bi-directional LSTM (BLSTM) in the utterance-wise evaluation and a 15.4% WER reduction in the continuous evaluation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源