Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents

被引:5
|
作者
Ehrmann, Maud [1 ]
Romanello, Matteo [2 ]
Najem-Meyer, Sven [1 ]
Doucet, Antoine [3 ]
Clematide, Simon [4 ]
机构
[1] EPFL, Digital Humanities Lab, Vaud, Switzerland
[2] Univ Lausanne, Lausanne, Switzerland
[3] Univ La Rochelle, La Rochelle, France
[4] Univ Zurich, Dept Computat Linguist, Zurich, Switzerland
关键词
Named entity recognition and classification; Entity linking; Historical texts; Information extraction; Digitised newspapers; Digital humanities;
D O I
10.1007/978-3-031-13643-6_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an overview of the second edition of HIPE (Identifying Historical People, Places and other Entities), a shared task on named entity recognition and linking in multilingual historical documents. Following the success of the first CLEF-HIPE-2020 evaluation lab, HIPE-2022 confronts systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation tag sets. This shared task is part of the ongoing efforts of the natural language processing and digital humanities communities to adapt and develop appropriate technologies to efficiently retrieve and explore information from historical texts. On such material, however, named entity processing techniques face the challenges of domain heterogeneity, input noisiness, dynamics of language, and lack of resources. In this context, the main objective of HIPE-2022, run as an evaluation lab of the CLEF 2022 conference, is to gain new insights into the transferability of named entity processing approaches across languages, time periods, document types, and annotation tag sets. Tasks, corpora, and results of participating teams are presented.
引用
收藏
页码:423 / 446
页数:24
相关论文
共 50 条
  • [31] Named Entity Recognition from Structured Data in Enterprise Documents
    Liang, Yaobo
    Chen, Shuoying
    Chen, Fengjiao
    Ji, Lei
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING APPLICATIONS (CSEA 2015), 2015, : 253 - 259
  • [32] An Analysis of the Performance of Named Entity Recognition over OCRed Documents
    Hamdi, Ahmed
    Jean-Caurant, Axel
    Sidere, Nicolas
    Coustaty, Mickael
    Doucet, Antoine
    2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019), 2019, : 333 - 334
  • [33] Transfer Learning for Named Entity Recognition in Financial and Biomedical Documents
    Francis, Sumam
    Van Landeghem, Jordy
    Moens, Marie-Francine
    INFORMATION, 2019, 10 (08)
  • [34] Named Entity Recognition for Improving Retrieval and Translation of Chinese Documents
    Srihari, Rohini K.
    Peterson, Erik
    DIGITAL LIBRARIES: UNIVERSAL AND UBIQUITOUS ACCESS TO INFORMATION, PROCEEDINGS, 2008, 5362 : 404 - +
  • [35] Named Entity Recognition of Spoken Documents using Subword Units
    Paass, Gerhard
    Pilz, Anja
    Schwenninger, Jochen
    2009 IEEE THIRD INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2009), 2009, : 529 - 534
  • [36] Novelty detection for text documents using named entity recognition
    Ng, Kok Wah
    Tsai, Flora S.
    Chen, Lihui
    Goh, Kiat Chong
    2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 1663 - +
  • [37] HistNERo: Historical Named Entity Recognition for the Romanian Language
    Avram, Andrei-Marius
    Iuga, Andreea
    Manolache, George-Vlad
    Matei, Vlad-Cristian
    Miclius, Razvan-Gabriel
    Muntean, Vlad-Andrei
    Sorlescu, Manuel-Petru
    Serban, Dragon-Andrei
    Urse, Adrian-Dinu
    Pais, Vasile
    Cerce, Dumitru-Clementin
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT III, 2024, 14806 : 126 - 144
  • [38] Nested named entity recognition in historical archive text
    Byrne, Kate
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 589 - 596
  • [39] TLR at BSNLP2019: A Multilingual Named Entity Recognition System
    Moreno, Jose G.
    Pontes, Elvys Linhares
    Coustaty, Mickael
    Doucet, Antoine
    7TH WORKSHOP ON BALTO-SLAVIC NATURAL LANGUAGE PROCESSING (BSNLP'2019), 2019, : 83 - 88
  • [40] Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language
    Khairunnisa, Siti Oryza
    Chen, Zhousi
    Komachi, Mamoru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)