自然语言处理链中的跨语言以事件为中心的知识管道，用于欧盟资源不足的语言

论文标题

自然语言处理链中的跨语言以事件为中心的知识管道，用于欧盟资源不足的语言

Natural Language Processing Chains Inside a Cross-lingual Event-Centric Knowledge Pipeline for European Union Under-resourced Languages

论文作者

Alves, Diego, Thakkar, Gaurish, Tadić, Marko

论文摘要

本文介绍了开发一个平台，该平台包含欧盟语言的语言处理链，包括用于解析的令牌化，还包括命名的实体识别，并加上添加了义务分析。这些连锁店是以事件为中心的知识处理管道的第一步的一部分，其目的是处理有关可能引起欧洲和世界其他地区影响的主要事件的多语言媒体信息。由于每种语言的语言资源的可用性有所不同，我们已经以三个步骤构建了此策略，从处理资源良好的语言的链条开始，并随着资源不足的语言开发新模块的开发。为了根据资源对所有欧盟官方语言进行分类，我们分析了带注释的语料库的大小以及主流语言处理工具中的预训练模型的存在，并且我们将这些信息与在Meta-Neta-netwhite Paper Paperers系列中发布的拟议分类相结合。

This article presents the strategy for developing a platform containing Language Processing Chains for European Union languages, consisting of Tokenization to Parsing, also including Named Entity recognition andwith addition ofSentiment Analysis. These chains are part of the first step of an event-centric knowledge processing pipeline whose aim is to process multilingual media information about major events that can cause an impactin Europe and the rest of the world. Due to the differences in terms of availability of language resources for each language, we have built this strategy in three steps, starting with processing chains for the well-resourced languages and finishing with the development of new modules for the under-resourced ones. In order to classify all European Union official languages in terms of resources, we have analysed the size of annotated corpora as well as the existence of pre-trained models in mainstream Language Processing tools, and we have combined this information with the proposed classification published at META-NETwhitepaper series.

下载PDF全文

下载文献需遵守相关版权规定

论文标题