论文标题
英语 - 越南语音翻译的高质量和大规模数据集
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation
论文作者
论文摘要
在本文中,我们介绍了一个高质量和大规模的基准数据集,用于英语 - 越南语音翻译,其中有508个音频小时,包括331k的三胞胎(句子长度,英语源书本句子,英语成绩单句子,越南人目标subtitle句子)。我们还使用强基础进行了经验实验,发现传统的“级联”方法仍然超过了现代的“端到端”方法。据我们所知,这是第一个大规模的英语 - 越南语音翻译研究。我们希望我们的公开数据集和研究都可以作为未来研究和英语语音翻译应用的起点。我们的数据集可从https://github.com/vinairesearch/phost获得
In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence). We also conduct empirical experiments using strong baselines and find that the traditional "Cascaded" approach still outperforms the modern "End-to-End" approach. To the best of our knowledge, this is the first large-scale English-Vietnamese speech translation study. We hope both our publicly available dataset and study can serve as a starting point for future research and applications on English-Vietnamese speech translation. Our dataset is available at https://github.com/VinAIResearch/PhoST