A Dataset of German Legal Documents for Named Entity Recognition

被引:0
|
作者
Leitner, Elena [1 ]
Rehm, Georg [1 ]
Moreno-Schneider, Julian [1 ]
机构
[1] DFKI GmbH, Alt Moabit 91c, D-10559 Berlin, Germany
基金
欧盟地平线“2020”;
关键词
Named Entity Recognition; NER; Legal Documents; Legal Domain; Corpus Creation; Corpus Annotation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.
引用
收藏
页码:4478 / 4485
页数:8
相关论文
共 50 条
  • [31] Named Entity Recognition from Structured Data in Enterprise Documents
    Liang, Yaobo
    Chen, Shuoying
    Chen, Fengjiao
    Ji, Lei
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING APPLICATIONS (CSEA 2015), 2015, : 253 - 259
  • [32] An Analysis of the Performance of Named Entity Recognition over OCRed Documents
    Hamdi, Ahmed
    Jean-Caurant, Axel
    Sidere, Nicolas
    Coustaty, Mickael
    Doucet, Antoine
    2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019), 2019, : 333 - 334
  • [33] Transfer Learning for Named Entity Recognition in Financial and Biomedical Documents
    Francis, Sumam
    Van Landeghem, Jordy
    Moens, Marie-Francine
    INFORMATION, 2019, 10 (08)
  • [34] Named Entity Recognition for Improving Retrieval and Translation of Chinese Documents
    Srihari, Rohini K.
    Peterson, Erik
    DIGITAL LIBRARIES: UNIVERSAL AND UBIQUITOUS ACCESS TO INFORMATION, PROCEEDINGS, 2008, 5362 : 404 - +
  • [35] NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval
    Katz, Uri
    Vetzler, Matan
    Cohen, Amir D. N.
    Goldberg, Yoav
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3340 - 3354
  • [36] Named Entity Recognition of Spoken Documents using Subword Units
    Paass, Gerhard
    Pilz, Anja
    Schwenninger, Jochen
    2009 IEEE THIRD INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2009), 2009, : 529 - 534
  • [37] Statistical dataset evaluation: A case study on named entity recognition
    Wang, Chengwen
    Dong, Qingxiu
    Wang, Xiaochen
    Sui, Zhifang
    NATURAL LANGUAGE PROCESSING, 2025, 31 (01): : 90 - 110
  • [38] Novelty detection for text documents using named entity recognition
    Ng, Kok Wah
    Tsai, Flora S.
    Chen, Lihui
    Goh, Kiat Chong
    2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 1663 - +
  • [39] Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents
    Oliveira, Vitor
    Nogueira, Gabriel
    Faleiros, Thiago
    Marcacini, Ricardo
    ARTIFICIAL INTELLIGENCE AND LAW, 2024,
  • [40] Enhancing Semantic Searching of Legal Documents Through LSTM-Based Named Entity Recognition and Semantic Classification
    Naik, Varsha
    Rajeswari, K.
    Patel, Purvang
    INTERNATIONAL JOURNAL FOR THE SEMIOTICS OF LAW-REVUE INTERNATIONALE DE SEMIOTIQUE JURIDIQUE, 2024, 37 (07): : 2113 - 2130