An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition

被引:2
|
作者
Hoxha, Klesti [1 ]
Baxhaku, Artur [1 ]
机构
[1] Univ Tirana, Fac Nat Sci, Tirana 1001, Albania
关键词
Named entity recognition; natural language processing; language corpora; semi-automatic annotation; information extraction;
D O I
10.2478/cait-2018-0009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Named Entity Recognition (NER) is an important task in many NLP pipelines. It has become especially important for knowledge bases that power many of the nowadays information retrieval systems. In order to cope with the high demand for annotated training corpora for supervised NER systems, automatic generation approaches have been proposed. In this paper we report on the first automatically generated NE annotated corpus for Albanian. News articles from Albanian news media were used as a document source. They were automatically tagged using a custom generated gazetteer from the Albanian Wikipedia. Our evaluation results show that this corpus can be used as a baseline corpus for human annotated ones or as a training corpus where no other is available.
引用
收藏
页码:95 / 108
页数:14
相关论文
共 50 条
  • [41] A Named Entity Recognition Corpus for Vietnamese Biomedical Texts to Support Tuberculosis Treatment
    Phan, Uyen T. P.
    Nguyen, Phuong N. V.
    Nguyen, Nhung T. H.
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3601 - 3609
  • [42] UlyssesNER-Br: A Corpus of Brazilian Legislative Documents for Named Entity Recognition
    Albuquerque, Hidelberg O.
    Costa, Rosimeire
    Silvestre, Gabriel
    Souza, Ellen
    da Silva, Nadia F. F.
    Vitorio, Douglas
    Moriyama, Gyovana
    Martins, Lucas
    Soezima, Luiza
    Nunes, Augusto
    Siqueira, Felipe
    Tarrega, Joao P.
    Beinotti, Joao, V
    Dias, Marcio
    Silva, Matheus
    Gardini, Miguel
    Silva, Vinicius
    de Carvalho, Andre C. P. L. F.
    Oliveira, Adriano L., I
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 3 - 14
  • [43] Named Entity Recognition Modeling for the Thai Language from a Disjointedly Labeled Corpus
    Suriyachay, Kitiya
    Sornlertlamvanich, Virach
    2018 5TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS (ICAICTA 2018), 2018, : 30 - 35
  • [44] DrugSemantics: A corpus for Named Entity. Recognition in Spanish Summaries of Product Characteristics
    Moreno, Isabel
    Boldrini, Ester
    Moreda, Paloma
    Teresa Roma-Ferri, M.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 72 : 8 - 22
  • [45] GraphNER: Using Corpus Level Similarities and Graph Propagation for Named Entity Recognition
    Sheikhshab, Golnar
    Starks, Elizabeth
    Karsan, Aly
    Chiu, Readman
    Sarkar, Anoop
    Birol, Inanc
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 229 - 238
  • [46] Annotated Corpus of Named Entities for Ukrainian Language
    Dmytrash, Olha
    Romanyuk, Andriy
    2013 12TH INTERNATIONAL CONFERENCE ON THE EXPERIENCE OF DESIGNING AND APPLICATION OF CAD SYSTEMS IN MICROELECTRONICS (CADSM 2013), 2013, : 80 - 81
  • [47] Named Entity Recognition for Vietnamese
    Dat Ba Nguyen
    Son Huu Hoang
    Son Bao Pham
    Thai Phuong Nguyen
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, PROCEEDINGS, 2010, 5991 : 205 - 214
  • [48] Named Entity Recognition for Tweets
    Liu, Xiaohua
    Wei, Furu
    Zhang, Shaodian
    Zhou, Ming
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2013, 4 (01)
  • [49] Persian Named Entity Recognition
    Dashtipour, Kia
    Gogate, Mandar
    Adeel, Ahsan
    Algarafi, Abdulrahman
    Howard, Newton
    Hussain, Amir
    2017 IEEE 16TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2017, : 79 - 83
  • [50] An Overview of Named Entity Recognition
    Sun, Peng
    Yang, Xuezhen
    Zhao, Xiaobing
    Wang, Zhijuan
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 273 - 278