Named Entity Recognition for Partially Annotated Datasets

被引:0
|
作者
Strobl, Michael [1 ]
Trabelsi, Amine [2 ]
Zaiane, Osmar [1 ]
机构
[1] Univ Alberta, Edmonton, AB, Canada
[2] Lakehead Univ, Thunder Bay, ON, Canada
关键词
Named entity recognition; Partially annotated datasets;
D O I
10.1007/978-3-031-08473-7_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The most common Named Entity Recognizers are usually sequence taggers trained on fully annotated corpora, i.e. the class of all words for all entities is known. Partially annotated corpora, i.e. some but not all entities of some types are annotated, are too noisy for training sequence taggers since the same entity may be annotated one time with its true type but not another time, misleading the tagger. Therefore, we are comparing three training strategies for partially annotated datasets and an approach to derive new datasets for new classes of entities from Wikipedia without time-consuming manual data annotation. In order to properly verify that our data acquisition and training approaches are plausible, we manually annotated test datasets for two new classes, namely food and drugs.
引用
收藏
页码:299 / 306
页数:8
相关论文
共 50 条
  • [1] Comparing Annotated Datasets for Named Entity Recognition in English Literature
    Ivanova, Rositsa V.
    Kirrane, Sabrina
    van Erp, Marieke
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3788 - 3797
  • [2] Named Entity Recognition Datasets: A Classification Framework
    Zhang, Ying
    Xiao, Gang
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [3] Named Entity Recognition Datasets: A Classification Framework
    Ying Zhang
    Gang Xiao
    International Journal of Computational Intelligence Systems, 17
  • [4] pioNER: Datasets and Baselines for Armenian Named Entity Recognition
    Ghukasyan, Tsolak
    Davtyan, Garnik
    Avetisyan, Karen
    Andrianov, Ivan
    2018 IVANNIKOV ISPRAS OPEN CONFERENCE (ISPRAS), 2018, : 56 - 61
  • [5] DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect
    Moussa, Hanane Nour
    Mourhir, Asmaa
    DATA IN BRIEF, 2023, 48
  • [6] AsNER - Annotated Dataset and Baseline for Assamese Named Entity recognition
    Pathak, Dhrubajyoti
    Nandi, Sukumar
    Sarmah, Priyankoo
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6571 - 6577
  • [7] Assessment of disease named entity recognition on a corpus of annotated sentences
    Jimeno, Antonio
    Jimenez-Ruiz, Ernesto
    Lee, Vivian
    Gaudan, Sylvain
    Berlanga, Rafael
    Rebholz-Schuhmann, Dietrich
    BMC BIOINFORMATICS, 2008, 9 (Suppl 3)
  • [8] Assessment of disease named entity recognition on a corpus of annotated sentences
    Antonio Jimeno
    Ernesto Jimenez-Ruiz
    Vivian Lee
    Sylvain Gaudan
    Rafael Berlanga
    Dietrich Rebholz-Schuhmann
    BMC Bioinformatics, 9
  • [9] An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition
    Hoxha, Klesti
    Baxhaku, Artur
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2018, 18 (01) : 95 - 108
  • [10] Partially Supervised Named Entity Recognition via the Expected Entity Ratio Loss
    Effland, Thomas
    Collins, Michael
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 1320 - 1335