论文标题
CJAFR-V3:一种免费的过滤日语对齐语料库
CJaFr-v3 : A Freely Available Filtered Japanese-French Aligned Corpus
论文作者
论文摘要
我们提出一个免费的日本平行语料库。它包括1500万个对齐段,并通过编译和过滤几种现有资源来获得。在本文中,我们描述了现有资源,其数量和质量,我们应用的过滤以提高语料库的质量以及现成的语料库的内容。我们还通过训练和评估一些标准MT系统来评估该语料库的实用性以及我们过滤的质量。
We present a free Japanese-French parallel corpus. It includes 15M aligned segments and is obtained by compiling and filtering several existing resources. In this paper, we describe the existing resources, their quantity and quality, the filtering we applied to improve the quality of the corpus, and the content of the ready-to-use corpus. We also evaluate the usefulness of this corpus and the quality of our filtering by training and evaluating some standard MT systems with it.