论文标题
庸俗:分析意大利语中年品种的语料库
Vulgaris: Analysis of a Corpus for Middle-Age Varieties of Italian Language
论文作者
论文摘要
意大利语是一种浪漫语言,源于庸俗拉丁语。现代意大利人的诞生始于14世纪左右的托斯卡纳,这主要归因于但丁·阿利吉尔(Dante Alighieri),弗朗切斯科·佩特拉卡(Francesco Petrarca)和乔瓦尼·博卡乔(Giovanni Boccaccio)的作品,他们是托斯卡纳西州中世纪最受好评的作者之一。但是,意大利的特征是多种方言,由于该地区的过去破碎,这些方言通常彼此之间彼此之间相关。意大利语吸收了许多这些方言的影响,以及由于其他国家(例如西班牙和法国)在该国部分地区的统治而引起的其他语言。在这项工作中,我们提出了一个旨在研究不同地区作者的意大利文本资源的项目,在1200到1600之间的时间范围内。每个作者都与其作者相关联,并且作者也分为家庭,即共享类似的样式/年代学特征。因此,数据集不仅是研究意大利语的历时演变及其方言之间的差异的宝贵资源,而且研究单个作者之间的风格方面也很有用。我们对数据提供了详细的统计分析,并提供了辩证法和历时品种的语料库驱动研究。
Italian is a Romance language that has its roots in Vulgar Latin. The birth of the modern Italian started in Tuscany around the 14th century, and it is mainly attributed to the works of Dante Alighieri, Francesco Petrarca and Giovanni Boccaccio, who are among the most acclaimed authors of the medieval age in Tuscany. However, Italy has been characterized by a high variety of dialects, which are often loosely related to each other, due to the past fragmentation of the territory. Italian has absorbed influences from many of these dialects, as also from other languages due to dominion of portions of the country by other nations, such as Spain and France. In this work we present Vulgaris, a project aimed at studying a corpus of Italian textual resources from authors of different regions, ranging in a time period between 1200 and 1600. Each composition is associated to its author, and authors are also grouped in families, i.e. sharing similar stylistic/chronological characteristics. Hence, the dataset is not only a valuable resource for studying the diachronic evolution of Italian and the differences between its dialects, but it is also useful to investigate stylistic aspects between single authors. We provide a detailed statistical analysis of the data, and a corpus-driven study in dialectology and diachronic varieties.