gpu加速指导源分离以满足转录

论文标题

gpu加速指导源分离以满足转录

GPU-accelerated Guided Source Separation for Meeting Transcription

论文作者

Raj, Desh, Povey, Daniel, Khudanpur, Sanjeev

论文摘要

指导源分离（GSS）是一种依赖预计的扬声器活动和盲源分离来执行重叠语音信号的前端增强的目标扬声器提取方法。它是在Chime-5挑战期间首次提出的，并对延迟和am束的基线提供了显着改进。但是，尽管具有优势，但该方法的采用有限，主要是由于其较高的计算时间造成的转录基准。在本文中，我们描述了我们改进的GSS实施，该GSS利用了现代基于GPU的管道的功能，包括频率和段的批处理处理，以对基于CPU的推理进行300倍的速度。改进的推理时间使我们能够对GSS算法的几个参数进行详细的消融研究 - 例如上下文持续时间，通道数量和噪声类别，仅举几例。我们为流行的会议基准：图书馆，AMI和Alimeeting提供端到端可重复的管道。我们的代码和食谱公开可用：https：//github.com/desh2608/gss。

Guided source separation (GSS) is a type of target-speaker extraction method that relies on pre-computed speaker activities and blind source separation to perform front-end enhancement of overlapped speech signals. It was first proposed during the CHiME-5 challenge and provided significant improvements over the delay-and-sum beamforming baseline. Despite its strengths, however, the method has seen limited adoption for meeting transcription benchmarks primarily due to its high computation time. In this paper, we describe our improved implementation of GSS that leverages the power of modern GPU-based pipelines, including batched processing of frequencies and segments, to provide 300x speed-up over CPU-based inference. The improved inference time allows us to perform detailed ablation studies over several parameters of the GSS algorithm -- such as context duration, number of channels, and noise class, to name a few. We provide end-to-end reproducible pipelines for speaker-attributed transcription of popular meeting benchmarks: LibriCSS, AMI, and AliMeeting. Our code and recipes are publicly available: https://github.com/desh2608/gss.

下载PDF全文

下载文献需遵守相关版权规定

论文标题