使用数据扩展改善数据驱动的逆文本归一化

论文标题

使用数据扩展改善数据驱动的逆文本归一化

Improving Data Driven Inverse Text Normalization using Data Augmentation

论文作者

Pandey, Laxmi, Paul, Debjyoti, Chitkara, Pooja, Pang, Yutong, Zhang, Xuedong, Schubert, Kjell, Chou, Mark, Liu, Shu, Saraf, Yatharth

论文摘要

反文本归一化（ITN）用于将自动语音识别（ASR）系统的口语输出转换为书面形式。传统手工制作的ITN规则可以复杂地转录和维护。同时，神经建模方法需要与ASR系统（内域数据）相同或相似的域中的优质大规模口语写作对示例。这两种方法都需要昂贵且复杂的注释。在本文中，我们提出了一种数据增强技术，该技术可有效地从室外文本数据中产生丰富的口头写入数字对，并以最少的人类注释。我们从经验上证明，使用我们的数据增强技术训练的ITN模型始终超过ITN模型，该模型仅使用14.44％的总体准确性在所有数字表面（例如Cardinal，货币和分数）上仅使用内域数据进行训练。

Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural modeling approaches require quality large-scale spoken-written pair examples in the same or similar domain as the ASR system (in-domain data), to train. Both these approaches require costly and complex annotations. In this paper, we present a data augmentation technique that effectively generates rich spoken-written numeric pairs from out-of-domain textual data with minimal human annotation. We empirically demonstrate that ITN model trained using our data augmentation technique consistently outperform ITN model trained using only in-domain data across all numeric surfaces like cardinal, currency, and fraction, by an overall accuracy of 14.44%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题