A Dataset of German Legal Documents for Named Entity Recognition

被引:0
|
作者
Leitner, Elena [1 ]
Rehm, Georg [1 ]
Moreno-Schneider, Julian [1 ]
机构
[1] DFKI GmbH, Alt Moabit 91c, D-10559 Berlin, Germany
基金
欧盟地平线“2020”;
关键词
Named Entity Recognition; NER; Legal Documents; Legal Domain; Corpus Creation; Corpus Annotation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.
引用
收藏
页码:4478 / 4485
页数:8
相关论文
共 50 条
  • [1] Named entity recognition in Vietnamese documents
    Tri Tran, Q.
    Thao Pham, T.X.
    Hung Ngo, Q.
    Dinh, Dien
    Collier, Nigel
    Progress in Informatics, 2007, (04): : 5 - 13
  • [2] A Named Entity Recognition Dataset for Turkish
    Kucuk, Dilek
    Kucuk, Dogan
    Arici, Nursal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 329 - 332
  • [3] On the Assessment of Deep Learning Models for Named Entity Recognition of Brazilian Legal Documents
    Albuquerque, Hidelberg O.
    Souza, Ellen
    Oliveira, Adriano L. I.
    Macedo, David
    Zanchettin, Cleber
    Vitorio, Douglas
    da Silva, Nadia F. F.
    de Carvalho, Andre C. P. L. F.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT II, 2023, 14116 : 93 - 104
  • [4] LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text
    Luz de Araujo, Pedro Henrique
    de Campos, Teofilo E.
    de Oliveira, Renato R. R.
    Stauffer, Matheus
    Couto, Samuel
    Bermejo, Paulo
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 313 - 323
  • [5] A RoBERTa-GlobalPointer-Based Method for Named Entity Recognition of Legal Documents
    Zhang, Xinrui
    Luo, Xudong
    Wu, Jiaye
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [6] Named Entity Recognition for Tamil Biomedical Documents
    Antony, Betina J.
    Mahalakshmi, G. S.
    2014 IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT-2014), 2014, : 1571 - 1577
  • [7] KazNERD: Kazakh Named Entity Recognition Dataset
    Yeshpanov, Rustem
    Khassanov, Yerbolat
    Varol, Huseyin Atakan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 417 - 426
  • [8] DroNER: Dataset for drone named entity recognition
    Silalahi, Swardiantara
    Ahmad, Tohari
    Studiawan, Hudan
    DATA IN BRIEF, 2023, 48
  • [9] Arabic named entity recognition in crime documents
    Asharef, M.
    Omar, N.
    Albared, M.
    Journal of Theoretical and Applied Information Technology, 2012, 44 (01) : 1 - 6
  • [10] Evaluation of Named Entity Recognition in Handwritten Documents
    Villanova-Aparisi, David
    Martinez-Hinarejos, Carlos-D
    Romero, Veronica
    Pastor-Gadea, Moises
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 568 - 582