用于指定实体识别的德国法律文件数据集

论文标题

用于指定实体识别的德国法律文件数据集

A Dataset of German Legal Documents for Named Entity Recognition

论文作者

Leitner, Elena, Rehm, Georg, Moreno-Schneider, Julián

论文摘要

我们描述了为德国联邦法院判决中指定的实体识别而开发的数据集。它包括大约。 67,000个刑期超过200万个令牌。该资源包含54,000个手动注释的实体，映射到19个精细语义类别：人，法官，律师，国家，城市，街道，街道，景观，组织，公司，机构，机构，法院，品牌，法律，法律，法令，欧洲法律规范，法规，合同，法院，法院，法院决定和法律文献。此外，法律文件是自动注释的，其中有超过35,000个基于Timeml的时间表达式。该数据集以CONL-2002格式获得CC-BY 4.0许可证可用，该数据集已开发用于培训欧盟项目Lynx中德国法律文件的NER服务。

We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.

下载PDF全文

下载文献需遵守相关版权规定

论文标题