部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data

论文作者

Aksënova, Alëna, Chen, Zhehuai, Chiu, Chung-Cheng, van Esch, Daan, Golik, Pavel, Han, Wei, King, Levi, Ramabhadran, Bhuvana, Rosenberg, Andrew, Schwartz, Suzan, Wang, Gary

论文摘要

建立包容性的语音识别系统是朝着开发所有语言品种都可以使用的技术的关键步骤。因此，ASR系统必须独立于他们所说的方式为每个人工作。为了实现这一目标，应该有代表语言品种的可用数据集，以及对模型配置的理解，这对于实现对所有类型语音的强有力理解最有帮助。但是，没有足够的数据集用于重音语音，对于已经可用的数据集，需要探索更多的培训方法来提高重音语音识别的质量。在本文中，我们讨论了开发更具包容性ASR系统的最新进展，即，构建代表语言多样性的新数据集的重要性，并探索新颖的培训方法以提高所有用户的性能。我们解决了基准ASR系统中的最新指示，以衡量PAV2VEC 2.0预训练对重音语音识别的影响，并突出显示与各种ASR评估有关的语料库。

Building inclusive speech recognition systems is a crucial step towards developing technologies that speakers of all language varieties can use. Therefore, ASR systems must work for everybody independently of the way they speak. To accomplish this goal, there should be available data sets representing language varieties, and also an understanding of model configuration that is the most helpful in achieving robust understanding of all types of speech. However, there are not enough data sets for accented speech, and for the ones that are already available, more training approaches need to be explored to improve the quality of accented speech recognition. In this paper, we discuss recent progress towards developing more inclusive ASR systems, namely, the importance of building new data sets representing linguistic diversity, and exploring novel training approaches to improve performance for all users. We address recent directions within benchmarking ASR systems for accented speech, measure the effects of wav2vec 2.0 pre-training on accented speech recognition, and highlight corpora relevant for diverse ASR evaluations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题