On Arabic Stop-Words: A Comprehensive List and a Dedicated Morphological Analyzer

被引:1
|
作者
Namly, Driss [1 ]
Bouzoubaa, Karim [1 ]
Tajmout, Rachida [1 ]
Laadimi, Ali [2 ]
机构
[1] Mohammed V Univ Rabat, Mohammadia Sch Engineers, Rabat, Morocco
[2] Mohammed V Univ Rabat, Fac Arts & Humanities, Rabat, Morocco
关键词
Natural Language Processing; Arabic language; Information retrieval; Stop-words; Hidden Markov Model; Viterbi algorithm;
D O I
10.1007/978-3-030-32959-4_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stop-words detection is a key preprocessing step and an important component for many Natural Language Processing applications. For Arabic language, stop-words detection is a complex task due to Arabic morphology richness and to the nonexistence of a commonly accepted list. In this paper, we compile a new comprehensive Arabic stop-words list along a stop-words analyzer that combines that list with a machine-learning-based approach to get the most probable stop-word. The first step in our approach provides a context-free analysis and the most appropriate stop-word according to the sentence context is detected in the second step using the Hidden Markov Model. The developed analyzer evaluation yields to over than 97% of accuracy. This achievement outperforms the state of the art analyzers.
引用
收藏
页码:149 / 163
页数:15
相关论文
共 41 条
  • [1] Stop-words in Keyphrase Extraction Problem
    Popova, S.
    Kovriguina, L.
    Mouromtsev, D.
    Khodyrev, I.
    2013 14TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), 2013, : 113 - 121
  • [2] Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text
    Atwan, Jaffar
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (07): : 65 - 74
  • [3] Automatic Removal of Visual Stop-Words
    Roman-Rangel, Edgar
    Marchand-Maillet, Stephane
    PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 1145 - 1148
  • [4] Refined stop-words and morphological variants solutions applied to Hindi-English cross-lingual information retrieval
    Sharma, Vijay
    Mittal, Namita
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (03) : 2219 - 2227
  • [5] Influence of Stop-Words Removal on Sequence Patterns Identification within Comparable Corpora
    Munkova, Dasa
    Munk, Michal
    Vozar, Martin
    ICT INNOVATIONS 2013: ICT INNOVATIONS AND EDUCATION, 2014, 231 : 67 - 76
  • [6] Intelligent Tunisian Arabic Morphological Analyzer
    Karmani, Nadia B. M.
    Soussou, Hsan
    Alimi, Adel M.
    2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
  • [7] Arabic Stop Words: Towards a Generalisation and Standardisation
    Bouzoubaa, Karim
    Baidouri, Hicham
    Loukili, Taoufik
    El Yazidi, Taoufik
    KNOWLEDGE MANAGEMENT AND INNOVATION IN ADVANCING ECONOMIES-ANALYSES & SOLUTIONS, VOLS 1-3, 2009, : 1844 - +
  • [8] The Research of Sina Malicious Comments Detection Based on Semantic Information and Stop-Words Table
    Wang, Yanan
    Shi, Yijie
    PROCEEDINGS OF 2017 IEEE 7TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC), 2017, : 444 - 447
  • [9] Web-based Arabic morphological analyzer
    Berri, J
    Zidoum, H
    Atif, Y
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2001, 2004 : 389 - 400
  • [10] MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects
    Habash, Nizar
    Rambow, Owen
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 681 - 688