论文标题
西班牙报纸头条新兴的英语主义的注释语料库
An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines
论文作者
论文摘要
对于词典目的和NLP下游任务,符合英语借款的提取(词汇借款)与NLP有关。我们介绍了一个欧洲西班牙报纸的头条新闻,该报纸以英语主义和统一性提取的基线模型注释。在本文中,我们介绍:(1)以欧洲西班牙语写成的21,570个报纸头条的语料库,这些欧洲西班牙语用新兴的英语主义注释,(2)有条件的随机田地基线模型,具有手工制作的特征,用于符号主义。我们介绍报纸头条语料库,描述注释标签设定和准则,并引入CRF模型,该模型可以作为检测英词主义任务的基准。介绍的作品是为西班牙新闻创建英词提取器的第一步。
The extraction of anglicisms (lexical borrowings from English) is relevant both for lexicographic purposes and for NLP downstream tasks. We introduce a corpus of European Spanish newspaper headlines annotated with anglicisms and a baseline model for anglicism extraction. In this paper we present: (1) a corpus of 21,570 newspaper headlines written in European Spanish annotated with emergent anglicisms and (2) a conditional random field baseline model with handcrafted features for anglicism extraction. We present the newspaper headlines corpus, describe the annotation tagset and guidelines and introduce a CRF model that can serve as baseline for the task of detecting anglicisms. The presented work is a first step towards the creation of an anglicism extractor for Spanish newswire.