论文标题

连续语音关键字发现的无锚检测器

An Anchor-Free Detector for Continuous Speech Keyword Spotting

论文作者

Zhao, Zhiyuan, Tang, Chuanxin, Yao, Chengdong, Luo, Chong

论文摘要

连续的语音关键字发现(CSKW)是在连续语音中检测预定义关键字的任务。在本文中,我们将CSKW视为一维对象检测任务,并提出了一个新颖的无锚检测器,名为AF-KWS,以解决该问题。 AF-KWS通过单阶段的深神经网络直接将关键字的中心位置和长度回归。特别是,当我们引入一个辅助未知类时,AF-KWS是针对此语音任务量身定制的,以将其他单词排除在非语言或无声背景之外。我们已经为CSKWS构建了两个名为Liberitop-20的基准数据集和连续的会议分析关键字(CMAK)数据集。对这两个数据集的评估表明,我们提出的AF-KWS优于参考方案的幅度很大,因此为将来的研究提供了不错的基线。

Continuous Speech Keyword Spotting (CSKWS) is a task to detect predefined keywords in a continuous speech. In this paper, we regard CSKWS as a one-dimensional object detection task and propose a novel anchor-free detector, named AF-KWS, to solve the problem. AF-KWS directly regresses the center locations and lengths of the keywords through a single-stage deep neural network. In particular, AF-KWS is tailored for this speech task as we introduce an auxiliary unknown class to exclude other words from non-speech or silent background. We have built two benchmark datasets named LibriTop-20 and continuous meeting analysis keywords (CMAK) dataset for CSKWS. Evaluations on these two datasets show that our proposed AF-KWS outperforms reference schemes by a large margin, and therefore provides a decent baseline for future research.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源