论文标题

西班牙报纸头条新兴的英语主义的注释语料库

An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines

论文作者

Álvarez-Mellado, Elena

论文摘要

对于词典目的和NLP下游任务,符合英语借款的提取(词汇借款)与NLP有关。我们介绍了一个欧洲西班牙报纸的头条新闻,该报纸以英语主义和统一性提取的基线模型注释。在本文中,我们介绍:(1)以欧洲西班牙语写成的21,570个报纸头条的语料库,这些欧洲西班牙语用新兴的英语主义注释,(2)有条件的随机田地基线模型,具有手工制作的特征,用于符号主义。我们介绍报纸头条语料库,描述注释标签设定和准则,并引入CRF模型,该模型可以作为检测英词主义任务的基准。介绍的作品是为西班牙新闻创建英词提取器的第一步。

The extraction of anglicisms (lexical borrowings from English) is relevant both for lexicographic purposes and for NLP downstream tasks. We introduce a corpus of European Spanish newspaper headlines annotated with anglicisms and a baseline model for anglicism extraction. In this paper we present: (1) a corpus of 21,570 newspaper headlines written in European Spanish annotated with emergent anglicisms and (2) a conditional random field baseline model with handcrafted features for anglicism extraction. We present the newspaper headlines corpus, describe the annotation tagset and guidelines and introduce a CRF model that can serve as baseline for the task of detecting anglicisms. The presented work is a first step towards the creation of an anglicism extractor for Spanish newswire.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源