On Arabic Stop-Words: A Comprehensive List and a Dedicated Morphological Analyzer

被引:1
|
作者
Namly, Driss [1 ]
Bouzoubaa, Karim [1 ]
Tajmout, Rachida [1 ]
Laadimi, Ali [2 ]
机构
[1] Mohammed V Univ Rabat, Mohammadia Sch Engineers, Rabat, Morocco
[2] Mohammed V Univ Rabat, Fac Arts & Humanities, Rabat, Morocco
关键词
Natural Language Processing; Arabic language; Information retrieval; Stop-words; Hidden Markov Model; Viterbi algorithm;
D O I
10.1007/978-3-030-32959-4_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stop-words detection is a key preprocessing step and an important component for many Natural Language Processing applications. For Arabic language, stop-words detection is a complex task due to Arabic morphology richness and to the nonexistence of a commonly accepted list. In this paper, we compile a new comprehensive Arabic stop-words list along a stop-words analyzer that combines that list with a machine-learning-based approach to get the most probable stop-word. The first step in our approach provides a context-free analysis and the most appropriate stop-word according to the sentence context is detected in the second step using the Hidden Markov Model. The developed analyzer evaluation yields to over than 97% of accuracy. This achievement outperforms the state of the art analyzers.
引用
收藏
页码:149 / 163
页数:15
相关论文
共 41 条
  • [31] The Role of Morphological Decomposition in Reading Complex Words in Arabic in Elementary School Years
    Asadi, Ibrahim A.
    Vaknin-Nusbaum, Vered
    Taha, Haitham
    JOURNAL OF PSYCHOLINGUISTIC RESEARCH, 2023, 52 (06) : 2863 - 2876
  • [32] Fine-Grain Morphological Analyzer and Part-of-Speech Tagger for Arabic Text
    Sawalha, Majdi
    Atwell, Eric
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1258 - 1265
  • [33] Finite State Machine Pattern-Root Arabic Morphological Generator, Analyzer and Diacritizer
    Alkhairy, Maha
    Jafri, Afshan
    Smith, David A.
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3834 - 3841
  • [34] Novel Language Resources for Hindi: An Aesthetics Text Corpus and a Comprehensive Stop Lemma List
    Venugopal-Wairagade, Gayatri
    Saini, Jatinderkumar R.
    Pramod, Dhanya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 233 - 239
  • [35] Automatic construction of generic Hausa language stop words list using term frequency-inverse document frequency
    Abubakar Salisu Bashir
    Abdulkadir Abubakar Bichi
    Alhassan Adamu
    Journal of Electrical Systems and Information Technology, 11 (1)
  • [36] Social Choice Theory Based Domain Specific Hindi Stop Words List Construction and Its Application in Text Mining
    Rani, Ruby
    Lobiyal, D. K.
    INTELLIGENT HUMAN COMPUTER INTERACTION, 2018, 11278 : 123 - 135
  • [37] Non-linear processing of a linear speech stream: The influence of morphological structure on the recognition of spoken Arabic words
    Gwilliams, L.
    Marantz, A.
    BRAIN AND LANGUAGE, 2015, 147 : 1 - 13
  • [38] The use of gum Arabic as "Green" stabilizer of poly(aniline) nanocomposites: A comprehensive study of spectroscopic, morphological and electrochemical properties
    Quintanilha, Ronaldo C.
    Orth, Elisa S.
    Grein-Iankovski, Aline
    Riegel-Vidotti, Izabel C.
    Vidotti, Marcio
    JOURNAL OF COLLOID AND INTERFACE SCIENCE, 2014, 434 : 18 - 27
  • [40] Does the Morphological Structure of L1 Equivalents Influence the Processing of L2 Words? Evidence from Arabic-English Bilinguals
    Salam, Dina Abdel El-Dakhs
    Maram, Al-Khodair
    Rawan, Alwazzan
    Jeanette, Altarriba
    PSYCHOLINGUISTICS, 2020, 27 (02): : 11 - 43