论文标题

用于指定实体识别的德国法律文件数据集

A Dataset of German Legal Documents for Named Entity Recognition

论文作者

Leitner, Elena, Rehm, Georg, Moreno-Schneider, Julián

论文摘要

我们描述了为德国联邦法院判决中指定的实体识别而开发的数据集。它包括大约。 67,000个刑期超过200万个令牌。该资源包含54,000个手动注释的实体,映射到19个精细语义类别:人,法官,律师,国家,城市,街道,街道,景观,组织,公司,机构,机构,法院,品牌,法律,法律,法令,欧洲法律规范,法规,合同,法院,法院,法院决定和法律文献。此外,法律文件是自动注释的,其中有超过35,000个基于Timeml的时间表达式。该数据集以CONL-2002格式获得CC-BY 4.0许可证可用,该数据集已开发用于培训欧盟项目Lynx中德国法律文件的NER服务。

We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源