A Dataset of German Legal Documents for Named Entity Recognition

被引:0
|
作者
Leitner, Elena [1 ]
Rehm, Georg [1 ]
Moreno-Schneider, Julian [1 ]
机构
[1] DFKI GmbH, Alt Moabit 91c, D-10559 Berlin, Germany
基金
欧盟地平线“2020”;
关键词
Named Entity Recognition; NER; Legal Documents; Legal Domain; Corpus Creation; Corpus Annotation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.
引用
收藏
页码:4478 / 4485
页数:8
相关论文
共 50 条
  • [41] Multi-Layout Invoice Document Dataset (MIDD): A Dataset for Named Entity Recognition
    Baviskar, Dipali
    Ahirrao, Swati
    Kotecha, Ketan
    DATA, 2021, 6 (07)
  • [42] A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
    Hamdi, Ahmed
    Pontes, Elvys Linhares
    Boros, Emanuela
    Thi Tuyet Hai Nguyen
    Hackl, Guenter
    Moreno, Jose G.
    Doucet, Antoine
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2328 - 2334
  • [43] Towards Robust Named Entity Recognition for Historic German
    Schweter, Stefan
    Baiter, Johannes
    4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 96 - 103
  • [44] Generative named entity recognition framework for Chinese legal domain
    Mao, Xingliang
    Jiang, Jie
    Zeng, Yongzhe
    Peng, Yinan
    Zhang, Shichao
    Li, Fangfang
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [45] Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language
    Khairunnisa, Siti Oryza
    Chen, Zhousi
    Komachi, Mamoru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [46] Developing named entity recognition algorithms for Uzbek: Dataset insights and implementation
    Mengliev, Davlatyor
    Barakhnin, Vladimir
    Abdurakhmonova, Nilufar
    Eshkulov, Mukhriddin
    DATA IN BRIEF, 2024, 54
  • [47] Research on College Academic Text Named Entity Recognition and Dataset Construction
    He, Chen
    Yuan, Yingchun
    Wang, Kejian
    Tao, Jia
    Computer Engineering and Applications, 2023, 59 (22) : 322 - 328
  • [48] Named entity recognition for Chinese judgment documents based on BiLSTM and CRF
    Wenming Huang
    Dengrui Hu
    Zhenrong Deng
    Jianyun Nie
    EURASIP Journal on Image and Video Processing, 2020
  • [49] Cross-Model Named Entity Recognition in Pictures for Procurement Documents
    Yang, Sai
    Liu, Xin
    Yu, Shaowen
    Computer Engineering and Applications, 2024, 60 (03) : 213 - 219
  • [50] CachacaNER: a dataset for named entity recognition in texts about the cachaca beverage
    Silva, Priscilla
    Franco, Arthur
    Santos, Thiago
    Brito, Mozar
    Pereira, Denilson
    LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (04) : 1315 - 1333