An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition

被引：2

作者：

Hoxha, Klesti ^{[1
]}

Baxhaku, Artur ^{[1
]}

机构：

[1] Univ Tirana, Fac Nat Sci, Tirana 1001, Albania

来源：

CYBERNETICS AND INFORMATION TECHNOLOGIES | 2018年 / 18卷 / 01期

关键词：

Named entity recognition; natural language processing; language corpora; semi-automatic annotation; information extraction;

D O I：

10.2478/cait-2018-0009

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Named Entity Recognition (NER) is an important task in many NLP pipelines. It has become especially important for knowledge bases that power many of the nowadays information retrieval systems. In order to cope with the high demand for annotated training corpora for supervised NER systems, automatic generation approaches have been proposed. In this paper we report on the first automatically generated NE annotated corpus for Albanian. News articles from Albanian news media were used as a document source. They were automatically tagged using a custom generated gazetteer from the Albanian Wikipedia. Our evaluation results show that this corpus can be used as a baseline corpus for human annotated ones or as a training corpus where no other is available.

引用

页码：95 / 108

页数：14

共 50 条

[1] Assessment of disease named entity recognition on a corpus of annotated sentences
Jimeno, Antonio
Jimenez-Ruiz, Ernesto
Lee, Vivian
Gaudan, Sylvain
Berlanga, Rafael
Rebholz-Schuhmann, Dietrich
BMC BIOINFORMATICS, 2008, 9 (Suppl 3)
[2] Assessment of disease named entity recognition on a corpus of annotated sentences
Antonio Jimeno
Ernesto Jimenez-Ruiz
Vivian Lee
Sylvain Gaudan
Rafael Berlanga
Dietrich Rebholz-Schuhmann
BMC Bioinformatics, 9
[3] A Named Entity Recognition Approach for Albanian
Skenduli, Marjana Prifti
Biba, Marenglen
2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 1532 - 1537
[4] Named Entity Recognition for Partially Annotated Datasets
Strobl, Michael
Trabelsi, Amine
Zaiane, Osmar
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 299 - 306
[5] Development of a Hindi Named Entity Recognition System without Using Manually Annotated Training Corpus
Saha, Sujan Kumar
Majumder, Mukta
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (06) : 1088 - 1098
[6] A Twitter Corpus for Named Entity Recognition in Turkish
Carik, Buse
Yeniterzi, Reyyan
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4546 - 4551
[7] Thai Nested Named Entity Recognition Corpus
Buaphet, Weerayut
Udomcharoenchaikit, Can
Limkonchotiwat, Peerat
Rutherford, Attapol T.
Nutanong, Sarana
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1473 - 1486
[8] A Finnish news corpus for named entity recognition
Teemu Ruokolainen
Pekka Kauppinen
Miikka Silfverberg
Krister Lindén
Language Resources and Evaluation, 2020, 54 : 247 - 272
[9] A Finnish news corpus for named entity recognition
Ruokolainen, Teemu
Kauppinen, Pekka
Silfverberg, Miikka
Linden, Krister
LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (01) : 247 - 272
[10] Building a Named Entity Annotated Bilingual English-Vietnamese Corpus
Tuan-An Dao
Hung-Thinh Truong
Long Nguyen
Dien Dinh
PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2018, : 61 - 66

← 1 2 3 4 5 →