Named Entity Corpus Construction using Wikipedia and DBpedia Ontology

被引:0
|
作者
Hahm, Younggyun [1 ]
Park, Jungyeul [2 ]
Lim, Kyungtae [3 ]
Kim, Youngsik [3 ]
Hwang, Dosam [4 ]
Choi, Key-Sun [1 ,3 ]
机构
[1] Korea Adv Inst Sci & Technol, Div Web Sci & Technol, Taejon, South Korea
[2] Univ Rennes 1, IRISA, UMR 6074, Lannion, France
[3] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon, South Korea
[4] Yeungnam Univ, Dept Comp Sci, Gyongsan, Gyeongsangbuk D, South Korea
关键词
Corpus; Named Entity Recognition; Linked Data;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In this paper, we propose a novel method to automatically build a named entity corpus based on the DBpedia ontology. Since most of named entity recognition systems require time and effort consuming annotation tasks as training data. Work on NER has thus for been limited on certain languages like English that are resource-abundant in general. As an alternative, we suggest that the NE corpus generated by our proposed method, can be used as training data. Our approach introduces Wikipedia as a raw text and uses the DBpedia data set for named entity disambiguation. Our method is language-independent and easy to be applied to many different languages where Wikipedia and DBpedia are provided. Throughout the paper, we demonstrate that our NE corpus is of comparable quality even to the manually annotated NE corpus.
引用
收藏
页码:2565 / 2569
页数:5
相关论文
共 50 条
  • [31] A Named Entity Labeler for German: exploiting Wikipedia and distributional clusters
    Chrupala, Grzegorz
    Klakow, Dietrich
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [32] Wikipedia-based Named Entity Recognition System for Turkish
    Kucuk, Dogan
    Arici, Nursal
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2016, 19 (03): : 325 - 332
  • [33] Exploiting Multilingual Wikipedia to Improve Arabic Named Entity Resources
    Biltawi, Mariam
    Awajan, Arafat
    Tedmori, Sara
    Al-Kouz, Akram
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (4A) : 598 - 607
  • [34] A Case Study on Start-up of Dataset Construction: In Case of Recipe Named Entity Corpus
    Yamakata, Yoko
    Tajima, Keishi
    Mori, Shinsuke
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3564 - 3567
  • [35] Reducing Human Effort in Named Entity Corpus Construction Based on Ensemble Learning and Annotation Categorization
    Lu, Tingming
    Zhu, Man
    Gao, Zhiqiang
    NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 : 263 - 274
  • [37] Ontology Extraction from Software Requirements Using Named-Entity Recognition
    Kocerka, Jerzy
    Krzeslak, Michal
    Galuszka, Adam
    ADVANCES IN SCIENCE AND TECHNOLOGY-RESEARCH JOURNAL, 2022, 16 (03) : 207 - 212
  • [38] GraphNER: Using Corpus Level Similarities and Graph Propagation for Named Entity Recognition
    Sheikhshab, Golnar
    Starks, Elizabeth
    Karsan, Aly
    Chiu, Readman
    Sarkar, Anoop
    Birol, Inanc
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 229 - 238
  • [39] Building the ArabNER Corpus for Arabic Named Entity Recognition Using ChatGPT and Bard
    Mahdhaoui, Hassen
    Mars, Abdelkarim
    Zrigui, Mounir
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT I, ACIIDS 2024, 2024, 14795 : 159 - 170
  • [40] Mining Concepts from Wikipedia for Ontology Construction
    Cui, Gaoying
    Lu, Qin
    Li, Wenjie
    Chen, Yirong
    2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 3, 2009, : 287 - 290