论文标题
语音助手系统中查询重写的图案感知数据增强
Pattern-aware Data Augmentation for Query Rewriting in Voice Assistant Systems
论文作者
论文摘要
查询重写(QR)系统被广泛用于减少由口语理解管道中的错误引起的摩擦。但是,基础监督模型需要大量的标记对,这些对很难收集。因此,我们提出了一个增强框架,该框架从现有的培训对中学习模式,并从重写标签中产生重写候选人,以弥补不足的QR培训数据。提出的框架将增强问题作为序列到序列的生成任务施放,并使用策略梯度技术来实施优化过程,以进行可控的奖励。这种方法超越了传统的启发式方法或基于规则的增强方法,并且不受限制地生成预定义的交换/更换单词的模式。我们的实验结果表明,与完全训练的QR基线相比,其有效性,并证明了其在提高低资源域或地区QR性能方面的潜在应用。
Query rewriting (QR) systems are widely used to reduce the friction caused by errors in a spoken language understanding pipeline. However, the underlying supervised models require a large number of labeled pairs, and these pairs are hard and costly to be collected. Therefore, We propose an augmentation framework that learns patterns from existing training pairs and generates rewrite candidates from rewrite labels inversely to compensate for insufficient QR training data. The proposed framework casts the augmentation problem as a sequence-to-sequence generation task and enforces the optimization process with a policy gradient technique for controllable rewarding. This approach goes beyond the traditional heuristics or rule-based augmentation methods and is not constrained to generate predefined patterns of swapping/replacing words. Our experimental results show its effectiveness compared with a fully trained QR baseline and demonstrate its potential application in boosting the QR performance on low-resource domains or locales.