A real time Named Entity Recognition system for Arabic text mining

被引:0
|
作者
Harith Al-Jumaily
Paloma Martínez
José L. Martínez-Fernández
Erik Van der Goot
机构
[1] Carlos III University of Madrid,Computer Science Department
[2] DAEDALUS – Data,undefined
[3] Decisions and Language S.A.,undefined
[4] EC Joint Research Centre,undefined
来源
关键词
Arabic language; Text mining; Named Entity Recognition; Event detection; Morphological analysis; Root extraction;
D O I
暂无
中图分类号
学科分类号
摘要
Arabic is the most widely spoken language in the Arab World. Most people of the Islamic World understand the Classic Arabic language because it is the language of the Qur’an. Despite the fact that in the last decade the number of Arabic Internet users (Middle East and North and East of Africa) has increased considerably, systems to analyze Arabic digital resources automatically are not as easily available as they are for English. Therefore, in this work, an attempt is made to build a real time Named Entity Recognition system that can be used in web applications to detect the appearance of specific named entities and events in news written in Arabic. Arabic is a highly inflectional language, thus we will try to minimize the impact of Arabic affixes on the quality of the pattern recognition model applied to identify named entities. These patterns are built up by processing and integrating different gazetteers, from DBPedia (http://dbpedia.org/About, 2009) to GATE (A general architecture for text engineering, 2009) and ANERGazet (http://users.dsic.upv.es/grupos/nle/?file=kop4.php).
引用
收藏
页码:543 / 563
页数:20
相关论文
共 50 条
  • [31] Comparison of Text Mining Models for Food and Dietary Constituent Named-Entity Recognition
    Perera, Nadeesha
    Thi Thuy Linh Nguyen
    Dehmer, Matthias
    Emmert-Streib, Frank
    [J]. MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2022, 4 (01): : 254 - 275
  • [32] One Class per Named Entity: Exploiting Unlabeled Text for Named Entity Recognition
    Wong, Yingchuan
    Ng, Hwee Tou
    [J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1763 - 1768
  • [33] Arabic Named Entity Recognition: A Feature-Driven Study
    Benajiba, Yassine
    Diab, Mona
    Rosso, Paolo
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05): : 926 - 934
  • [34] Named Entity Recognition in Arabic: A Review of Some Current Systems
    Elsebai, Ali
    Meziane, Farid
    [J]. CREATING GLOBAL ECONOMIES THROUGH INNOVATION AND KNOWLEDGE MANAGEMENT: THEORY & PRACTICE, VOLS 1-3, 2009, : 1245 - 1251
  • [35] Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic
    Zaghouani, Wajdi
    Pouliquen, Bruno
    Ebrahim, Mohamed
    Steinberger, Ralf
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [36] Data Augmentation Techniques on Arabic Data for Named Entity Recognition
    Sabty, Caroline
    Omar, Islam
    Wasfalla, Fady
    Islam, Mohamed
    Abdennadher, Slim
    [J]. AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 292 - 299
  • [37] A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining
    Kim, Donghyeon
    Lee, Jinhyuk
    So, Chan Ho
    Jeon, Hwisang
    Jeong, Minbyul
    Choi, Yonghwa
    Yoon, Wonjin
    Sung, Mujeen
    Kang, Jaewoo
    [J]. IEEE ACCESS, 2019, 7 : 73729 - 73740
  • [38] Simple Effective Microblog Named Entity Recognition: Arabic as an Example
    Darwish, Kareem
    Gao, Wei
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2513 - 2517
  • [39] Building the Classical Arabic Named Entity Recognition Corpus (CANERCorpus)
    Salah, Ramzi Esmail
    Zakaria, Lailatul Qadri Binti
    [J]. 2018 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2018, : 150 - 157
  • [40] Nested named entity recognition in historical archive text
    Byrne, Kate
    [J]. ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 589 - 596