论文标题
自训练端到端语音翻译
Self-Training for End-to-End Speech Translation
论文作者
论文摘要
端到端语音翻译的主要挑战之一是数据稀缺。我们利用级联音频和端到端语音翻译模型从未标记的音频产生的伪标签。这提供了8.3和5.7的BLEU在必要的英语 - 法国和英国 - 德国数据集上的强大半监督基线上获得,并达到了最先进的性能。研究了伪标签质量的影响。证明我们的方法比简单地对语音识别任务进行编码器更有效。最后,我们通过直接生成具有端到端模型而不是级联模型的伪标签来证明自我训练的有效性。
One of the main challenges for end-to-end speech translation is data scarcity. We leverage pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech translation model. This provides 8.3 and 5.7 BLEU gains over a strong semi-supervised baseline on the MuST-C English-French and English-German datasets, reaching state-of-the art performance. The effect of the quality of the pseudo-labels is investigated. Our approach is shown to be more effective than simply pre-training the encoder on the speech recognition task. Finally, we demonstrate the effectiveness of self-training by directly generating pseudo-labels with an end-to-end model instead of a cascade model.