论文标题
连续的语音分离与构象异构体
Continuous Speech Separation with Conformer
论文作者
论文摘要
连续的语音分离在复杂的语音相关任务(例如对话转录)中起着至关重要的作用。分离模型从混合语音中提取单个扬声器信号。在本文中,我们使用变压器和符合物代替了分离系统中的复发神经网络,因为我们认为,以基于自我指导的方法捕获全球信息对于语音分离至关重要。在评估图书馆数据集上,构象异构体模型可实现最新结果,在持续评估中,相对23.5%的单词错误率(WER)降低了双向LSTM(BLSTM),而在话语评估中相对降低了,在话语评估方面的相对单词错误率(BLSTM)降低,而持续评估的相对单词错误率(WER)降低了。
Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription. The separation model extracts a single speaker signal from a mixed speech. In this paper, we use transformer and conformer in lieu of recurrent neural networks in the separation system, as we believe capturing global information with the self-attention based method is crucial for the speech separation. Evaluating on the LibriCSS dataset, the conformer separation model achieves state of the art results, with a relative 23.5% word error rate (WER) reduction from bi-directional LSTM (BLSTM) in the utterance-wise evaluation and a 15.4% WER reduction in the continuous evaluation.