A Dataset of German Legal Documents for Named Entity Recognition

被引:0
|
作者
Leitner, Elena [1 ]
Rehm, Georg [1 ]
Moreno-Schneider, Julian [1 ]
机构
[1] DFKI GmbH, Alt Moabit 91c, D-10559 Berlin, Germany
基金
欧盟地平线“2020”;
关键词
Named Entity Recognition; NER; Legal Documents; Legal Domain; Corpus Creation; Corpus Annotation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.
引用
收藏
页码:4478 / 4485
页数:8
相关论文
共 50 条
  • [21] Named Entity Recognition in Unstructured Medical Text Documents
    Pearson, Cole
    Seliya, Naeem
    Dave, Rushit
    INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 412 - 417
  • [22] Comparison of named entity recognition methodologies in biomedical documents
    Hye-Jeong Song
    Byeong-Cheol Jo
    Chan-Young Park
    Jong-Dae Kim
    Yu-Seop Kim
    BioMedical Engineering OnLine, 17
  • [23] Named-entity recognition in Turkish legal texts
    Cetindag, Can
    Yazicioglu, Berkay
    Koc, Aykut
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (03) : 615 - 642
  • [24] Named entity recognition in the legal domain for ontology population
    Bruckschen, Mirian
    Northfleet, Caio
    da Silva, Douglas
    Bridi, Paulo
    Granada, Roger
    Vieira, Renata
    Rao, Prasad
    Sander, Tomas
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : I16 - I21
  • [25] DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect
    Moussa, Hanane Nour
    Mourhir, Asmaa
    DATA IN BRIEF, 2023, 48
  • [26] AsNER - Annotated Dataset and Baseline for Assamese Named Entity recognition
    Pathak, Dhrubajyoti
    Nandi, Sukumar
    Sarmah, Priyankoo
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6571 - 6577
  • [27] EduNER: a Chinese named entity recognition dataset for education research
    Xu Li
    Chengkun Wei
    Zhuoren Jiang
    Wenlong Meng
    Fan Ouyang
    Zihui Zhang
    Wenzhi Chen
    Neural Computing and Applications, 2023, 35 : 17717 - 17731
  • [28] Interpretable Multi-dataset Evaluation for Named Entity Recognition
    Fu, Jinlan
    Liu, Pengfei
    Neubig, Graham
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6058 - 6069
  • [29] NNE: A Dataset for Nested Named Entity Recognition in English Newswire
    Ringland, Nicky
    Dai, Xiang
    Hachey, Ben
    Karimi, Sarvnaz
    Paris, Cecile
    Curran, James R.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5176 - 5181
  • [30] EduNER: a Chinese named entity recognition dataset for education research
    Li, Xu
    Wei, Chengkun
    Jiang, Zhuoren
    Meng, Wenlong
    Ouyang, Fan
    Zhang, Zihui
    Chen, Wenzhi
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (24): : 17717 - 17731