Named Entity Corpus Construction using Wikipedia and DBpedia Ontology

被引:0
|
作者
Hahm, Younggyun [1 ]
Park, Jungyeul [2 ]
Lim, Kyungtae [3 ]
Kim, Youngsik [3 ]
Hwang, Dosam [4 ]
Choi, Key-Sun [1 ,3 ]
机构
[1] Korea Adv Inst Sci & Technol, Div Web Sci & Technol, Taejon, South Korea
[2] Univ Rennes 1, IRISA, UMR 6074, Lannion, France
[3] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon, South Korea
[4] Yeungnam Univ, Dept Comp Sci, Gyongsan, Gyeongsangbuk D, South Korea
关键词
Corpus; Named Entity Recognition; Linked Data;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In this paper, we propose a novel method to automatically build a named entity corpus based on the DBpedia ontology. Since most of named entity recognition systems require time and effort consuming annotation tasks as training data. Work on NER has thus for been limited on certain languages like English that are resource-abundant in general. As an alternative, we suggest that the NE corpus generated by our proposed method, can be used as training data. Our approach introduces Wikipedia as a raw text and uses the DBpedia data set for named entity disambiguation. Our method is language-independent and easy to be applied to many different languages where Wikipedia and DBpedia are provided. Throughout the paper, we demonstrate that our NE corpus is of comparable quality even to the manually annotated NE corpus.
引用
收藏
页码:2565 / 2569
页数:5
相关论文
共 50 条
  • [21] Extended Named Entity Ontology with Attribute Information
    Sekine, Satoshi
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 52 - 57
  • [22] Uzbek news corpus for named entity recognition
    Yusufu, Aizihaierjiang
    Aziz, Kamran
    Yusufu, Aizierguli
    Ainiwaer, Abidan
    Li, Fei
    Ji, Donghong
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [23] A Twitter Corpus for Named Entity Recognition in Turkish
    Carik, Buse
    Yeniterzi, Reyyan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4546 - 4551
  • [24] Exploiting Wikipedia Priori Knowledge for Chinese Named Entity Recognition
    Li, Jianfeng
    Zhu, Conghui
    Li, Sheng
    Zhao, Tiejun
    Zheng, Dequan
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1548 - 1552
  • [25] Towards a Balanced Named Entity Corpus for Dutch
    Desmet, Bart
    Hoste, Veronique
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [26] Thai Nested Named Entity Recognition Corpus
    Buaphet, Weerayut
    Udomcharoenchaikit, Can
    Limkonchotiwat, Peerat
    Rutherford, Attapol T.
    Nutanong, Sarana
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1473 - 1486
  • [27] A Finnish news corpus for named entity recognition
    Teemu Ruokolainen
    Pekka Kauppinen
    Miikka Silfverberg
    Krister Lindén
    Language Resources and Evaluation, 2020, 54 : 247 - 272
  • [28] A Finnish news corpus for named entity recognition
    Ruokolainen, Teemu
    Kauppinen, Pekka
    Silfverberg, Miikka
    Linden, Krister
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (01) : 247 - 272
  • [29] GerNED: A German Corpus for Named Entity Disambiguation
    Ploch, Danuta
    Hennig, Leonhard
    Duka, Angelina
    De Luca, Ernesto William
    Albayrak, Sahin
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3886 - 3893
  • [30] Introducing RONEC - the Romanian Named Entity Corpus
    Dumitrescu, Stefan Daniel
    Avram, Andrei-Marius
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4436 - 4443