论文标题
多通道目标语音提取随通道去相关和目标扬声器的适应
Multi-channel target speech extraction with channel decorrelation and target speaker adaptation
论文作者
论文摘要
单通道目标语音提取的端到端方法引起了广泛的关注。但是,端到端多通道目标语音提取的研究仍然相对有限。在这项工作中,我们提出了两种利用多通道空间信息来提取目标语音的方法。第一个是在并行编码器体系结构中使用目标语音适应层。第二个是设计通道去相关机制来提取通道间差异信息以增强多通道编码器表示。我们将提出的方法与两个强大的最新基线进行比较。多通道回响WSJ0 2-MIX数据集的实验结果表明,我们所提出的方法分别在SDR和SISDR中获得了高达11.2%和11.5%的相对改善,这是我们最佳知识的此任务上最佳报告的结果。
The end-to-end approaches for single-channel target speech extraction have attracted widespread attention. However, the studies for end-to-end multi-channel target speech extraction are still relatively limited. In this work, we propose two methods for exploiting the multi-channel spatial information to extract the target speech. The first one is using a target speech adaptation layer in a parallel encoder architecture. The second one is designing a channel decorrelation mechanism to extract the inter-channel differential information to enhance the multi-channel encoder representation. We compare the proposed methods with two strong state-of-the-art baselines. Experimental results on the multi-channel reverberant WSJ0 2-mix dataset demonstrate that our proposed methods achieve up to 11.2% and 11.5% relative improvements in SDR and SiSDR respectively, which are the best reported results on this task to the best of our knowledge.