论文标题
古典法国剧院的语料库和模型
Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre
论文作者
论文摘要
本文介绍了为古典法国文学建立带注释的语料库和培训模型的过程,重点是戏剧,尤其是在诗歌中的喜剧。它最初是作为在Cafiero and Camps [2019]中提出的口号分析的初步步骤开发的。使用基于神经网络和CRF标记器的最新Lemmatiser的使用可以实现超出当前最新域测试最新技术的精度,并证明在室外测试期间(即c.novels)在外域测试中非常健壮。
This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.