论文标题
可控的时间延迟变压器用于实时标点符号预测和不足检测
Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection
论文作者
论文摘要
随着近年来自动语音识别(ASR)的应用增加,至关重要的是,必须自动插入标点符号并消除成绩单中的疏离,以提高成绩单的可读性以及后续应用的性能,例如机器翻译,对话系统等。在本文中,我们提出了一个可控的时间延迟变压器(CT转换器)模型,该模型共同完成了标点符号预测和实时的差异检测任务。 CT转换器模型促进了具有可控时间延迟的部分输出,以实现后续应用程序所需的部分解码的实时约束。我们进一步提出了一种快速解码策略,以最大程度地减少延迟,同时保持竞争性能。 IWSLT2011基准数据集和内部中文注释的数据集的实验结果表明,所提出的方法的表现优于先前的F分数最新模型,并实现了竞争推论速度。
With the increased applications of automatic speech recognition (ASR) in recent years, it is essential to automatically insert punctuation marks and remove disfluencies in transcripts, to improve the readability of the transcripts as well as the performance of subsequent applications, such as machine translation, dialogue systems, and so forth. In this paper, we propose a Controllable Time-delay Transformer (CT-Transformer) model that jointly completes the punctuation prediction and disfluency detection tasks in real time. The CT-Transformer model facilitates freezing partial outputs with controllable time delay to fulfill the real-time constraints in partial decoding required by subsequent applications. We further propose a fast decoding strategy to minimize latency while maintaining competitive performance. Experimental results on the IWSLT2011 benchmark dataset and an in-house Chinese annotated dataset demonstrate that the proposed approach outperforms the previous state-of-the-art models on F-scores and achieves a competitive inference speed.