An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition

被引:2
|
作者
Hoxha, Klesti [1 ]
Baxhaku, Artur [1 ]
机构
[1] Univ Tirana, Fac Nat Sci, Tirana 1001, Albania
关键词
Named entity recognition; natural language processing; language corpora; semi-automatic annotation; information extraction;
D O I
10.2478/cait-2018-0009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Named Entity Recognition (NER) is an important task in many NLP pipelines. It has become especially important for knowledge bases that power many of the nowadays information retrieval systems. In order to cope with the high demand for annotated training corpora for supervised NER systems, automatic generation approaches have been proposed. In this paper we report on the first automatically generated NE annotated corpus for Albanian. News articles from Albanian news media were used as a document source. They were automatically tagged using a custom generated gazetteer from the Albanian Wikipedia. Our evaluation results show that this corpus can be used as a baseline corpus for human annotated ones or as a training corpus where no other is available.
引用
收藏
页码:95 / 108
页数:14
相关论文
共 50 条
  • [31] Urdu Named Entity Recognition: Corpus Generation and Deep Learning Applications
    Kanwal, Safia
    Malik, Kamran
    Shahzad, Khurram
    Aslam, Faisal
    Nawaz, Zubair
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (01)
  • [32] Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT
    Jarrar, Mustafa
    Khalilia, Mohammed
    Ghanem, Sana
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3626 - 3636
  • [33] A web-based Bengali news corpus for named entity recognition
    Asif Ekbal
    Sivaji Bandyopadhyay
    Language Resources and Evaluation, 2008, 42 : 173 - 182
  • [34] Using corpus-derived name lists for named entity recognition
    Stevenson, M
    Gaizauskas, R
    6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 290 - 295
  • [35] A web-based Bengali news corpus for named entity recognition
    Ekbal, Asif
    Bandyopadhyay, Sivaji
    LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (02) : 173 - 182
  • [36] NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding
    Wang, Kanix
    Stevens, Robert
    Alachram, Halima
    Li, Yu
    Soldatova, Larisa
    King, Ross
    Ananiadou, Sophia
    Schoene, Annika M.
    Li, Maolin
    Christopoulou, Fenia
    Ambite, Jose Luis
    Matthew, Joel
    Garg, Sahil
    Hermjakob, Ulf
    Marcu, Daniel
    Sheng, Emily
    Beissbarth, Tim
    Wingender, Edgar
    Galstyan, Aram
    Gao, Xin
    Chambers, Brendan
    Pan, Weidi
    Khomtchouk, Bohdan B.
    Evans, James A.
    Rzhetsky, Andrey
    NPJ SYSTEMS BIOLOGY AND APPLICATIONS, 2021, 7 (01)
  • [37] NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding
    Kanix Wang
    Robert Stevens
    Halima Alachram
    Yu Li
    Larisa Soldatova
    Ross King
    Sophia Ananiadou
    Annika M. Schoene
    Maolin Li
    Fenia Christopoulou
    José Luis Ambite
    Joel Matthew
    Sahil Garg
    Ulf Hermjakob
    Daniel Marcu
    Emily Sheng
    Tim Beißbarth
    Edgar Wingender
    Aram Galstyan
    Xin Gao
    Brendan Chambers
    Weidi Pan
    Bohdan B. Khomtchouk
    James A. Evans
    Andrey Rzhetsky
    npj Systems Biology and Applications, 7
  • [38] Transfer Learning from Automatically Annotated Data for Recognizing Named Entities in Recent Generated Texts
    Kim, Juae
    Park, Youngmin
    Kang, Sangwoo
    Seo, Jungyun
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2019, : 147 - 151
  • [39] An Automatically Built Named Entity Lexicon for Arabic
    Attia, M.
    Toral, A.
    Tounsi, L.
    Monachini, M.
    Genabith, J. V.
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [40] Automatically Finding Actors in Texts: A Performance Review of Multilingual Named Entity Recognition Tools
    Balluff, Paul
    Boomgaarden, Hajo G.
    Waldherr, Annie
    COMMUNICATION METHODS AND MEASURES, 2024,