古典法国剧院的语料库和模型

论文标题

古典法国剧院的语料库和模型

Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

论文作者

Camps, Jean-Baptiste, Gabay, Simon, Fièvre, Paul, Clérice, Thibault, Cafiero, Florian

论文摘要

本文介绍了为古典法国文学建立带注释的语料库和培训模型的过程，重点是戏剧，尤其是在诗歌中的喜剧。它最初是作为在Cafiero and Camps [2019]中提出的口号分析的初步步骤开发的。使用基于神经网络和CRF标记器的最新Lemmatiser的使用可以实现超出当前最新域测试最新技术的精度，并证明在室外测试期间（即c.novels）在外域测试中非常健壮。

This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题