Named Entity Corpus Construction using Wikipedia and DBpedia Ontology

被引:0
|
作者
Hahm, Younggyun [1 ]
Park, Jungyeul [2 ]
Lim, Kyungtae [3 ]
Kim, Youngsik [3 ]
Hwang, Dosam [4 ]
Choi, Key-Sun [1 ,3 ]
机构
[1] Korea Adv Inst Sci & Technol, Div Web Sci & Technol, Taejon, South Korea
[2] Univ Rennes 1, IRISA, UMR 6074, Lannion, France
[3] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon, South Korea
[4] Yeungnam Univ, Dept Comp Sci, Gyongsan, Gyeongsangbuk D, South Korea
关键词
Corpus; Named Entity Recognition; Linked Data;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In this paper, we propose a novel method to automatically build a named entity corpus based on the DBpedia ontology. Since most of named entity recognition systems require time and effort consuming annotation tasks as training data. Work on NER has thus for been limited on certain languages like English that are resource-abundant in general. As an alternative, we suggest that the NE corpus generated by our proposed method, can be used as training data. Our approach introduces Wikipedia as a raw text and uses the DBpedia data set for named entity disambiguation. Our method is language-independent and easy to be applied to many different languages where Wikipedia and DBpedia are provided. Throughout the paper, we demonstrate that our NE corpus is of comparable quality even to the manually annotated NE corpus.
引用
收藏
页码:2565 / 2569
页数:5
相关论文
共 50 条
  • [1] Building an Indonesian Named Entity Recognizer using Wikipedia and DBPedia
    Luthfi, Andry
    Distiawan, Bayu
    Manurung, Ruli
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 19 - 22
  • [2] Corpus Exploitation from Wikipedia for Ontology Construction
    Cui, Gaoying
    Lu, Qin
    Li, Wenjie
    Chen, Yirong
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2125 - 2132
  • [3] Named Entity Relation Mining Using Wikipedia
    Iftene, Adrian
    Balahur-Dobrescu, Alexandra
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 763 - 766
  • [4] Construction of a Geological Fault Corpus and Named Entity Recognition
    Wang, Huainuo
    Niu, Ruiqing
    Han, Yongyao
    Deng, Qinglu
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [5] Disambiguating the Twitter Stream Entities and Enhancing the Search Operation Using DBpedia Ontology: Named Entity Disambiguation for Twitter Streams
    Kumar, N. Senthil
    Muruganantham, Dinakaran
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING, 2016, 11 (02) : 51 - 62
  • [6] Named Entity Network based on Wikipedia
    Maskey, Sameer
    Dakka, Wisam
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1471 - +
  • [7] Named entity linking based on wikipedia
    Jin, P. (jpq@ustc.edu.cn), 1600, Science and Engineering Research Support Society (07):
  • [8] Semantic Relatedness for Named Entity Disambiguation Using a Small Wikipedia
    Fernandez, Izaskun
    Alegria, Inaki
    Ezeiza, Nerea
    TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 276 - 283
  • [9] Title Named Entity Recognition using Wikipedia and Abbreviation Generation
    Park, Youngmin
    Kang, Sangwoo
    Seo, Jungyun
    2014 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2014, : 169 - 172
  • [10] Corpus Construction for Named-Entity and Entity Relations for Electronic Medical Records of Cardiovascular Disease
    Chang, Hongyang
    Zan, Hongying
    Zhang, Shuai
    Zhao, Bingfei
    Zhang, Kunli
    HEALTH INFORMATION PROCESSING, CHIP 2022, 2023, 1772 : 3 - 18