论文标题

多级光谱聚类与扬声器诊断重叠

Multi-class Spectral Clustering with Overlaps for Speaker Diarization

论文作者

Raj, Desh, Huang, Zili, Khudanpur, Sanjeev

论文摘要

本文描述了一种重叠的说话者诊断的方法。鉴于重叠检测器和嵌入式提取器的扬声器,我们的方法执行了由重叠检测器输出所告知的片段的光谱聚类。这是通过将离散聚类问题转换为通过特征分类解决的凸优化问题来实现的。此后,我们通过使用单数值分解和修改的非最大抑制版本来离散解决方案,该版本受重叠检测器的输出的约束。此外,我们详细介绍了基于HMM-DNN的重叠检测器,该检测器执行帧级分类并通过HMM状态过渡实施持续时间约束。我们的方法在AMI Meeting语料库的混合头设置上达到了测试诊断错误率(DER)为24.0%,在强的聚集层次聚类基线基线的相对提高15.2%,并且与其他重叠型诊断方法相比。对图书馆数据的进一步分析证明了在高重叠条件下提出的方法的有效性。

This paper describes a method for overlap-aware speaker diarization. Given an overlap detector and a speaker embedding extractor, our method performs spectral clustering of segments informed by the output of the overlap detector. This is achieved by transforming the discrete clustering problem into a convex optimization problem which is solved by eigen-decomposition. Thereafter, we discretize the solution by alternatively using singular value decomposition and a modified version of non-maximal suppression which is constrained by the output of the overlap detector. Furthermore, we detail an HMM-DNN based overlap detector which performs frame-level classification and enforces duration constraints through HMM state transitions. Our method achieves a test diarization error rate (DER) of 24.0% on the mixed-headset setting of the AMI meeting corpus, which is a relative improvement of 15.2% over a strong agglomerative hierarchical clustering baseline, and compares favorably with other overlap-aware diarization methods. Further analysis on the LibriCSS data demonstrates the effectiveness of the proposed method in high overlap conditions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源