论文标题

量化法国文档的复杂性

Quantifying French Document Complexity

论文作者

Primpied, Vincent, Beauchemin, David, Khoury, Richard

论文摘要

衡量文件的复杂性水平是一个悬而未决的挑战,尤其是当一个人正在研究各种文档语料库而不是比较有关类似主题或使用英语以外的其他语言的文档时。在本文中,我们定义了一种方法,可以使用新的一般和多元化的文本,“加拿大法国的复杂性水平语料库”和广泛的指标来衡量法国文档的复杂性。我们将不同的学习算法与此任务进行了比较,并将其表演和对文本特征的观察结果进行对比,对其复杂性更为重要。我们的结果表明,我们的方法对法语中文本复杂性进行了通用测量。

Measuring a document's complexity level is an open challenge, particularly when one is working on a diverse corpus of documents rather than comparing several documents on a similar topic or working on a language other than English. In this paper, we define a methodology to measure the complexity of French documents, using a new general and diversified corpus of texts, the "French Canadian complexity level corpus", and a wide range of metrics. We compare different learning algorithms to this task and contrast their performances and their observations on which characteristics of the texts are more significant to their complexity. Our results show that our methodology gives a general-purpose measurement of text complexity in French.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源