Contextual Text Categorization: An Improved Stemming Algorithm to Increase the Quality of Categorization in Arabic Text

被引:0
|
作者
Gadri, Said [1 ]
Moussaoui, Abdelouahab [1 ]
机构
[1] Univ Ferhat Abbas Setif, Dept Comp Sci, Setif, Algeria
关键词
Root extraction; information retrieval; bigrams; stemming; Arabic morphological rules; feature selection; ROOT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the methods used to reduce the size of terms vocabulary in Arabic text categorization is to replace the different variants (forms) of words by their common root. This process is called stemming based on the extraction of the root. Therefore, the search of the root in Arabic or Arabic word root extraction is more difficult than in other languages since the Arabic language has a very different and difficult structure, that is because it is a very rich language with complex morphology. Many algorithms are proposed in this field. Some of them are based on morphological rules and grammatical patterns, thus they are quite difficult and require deep linguistic knowledge. Others are statistical, so they are less difficult and based only on some calculations. In this paper we propose an improved stemming algorithm based on the extraction of the root and the technique of n-grams which permit to return Arabic words' stems without using any morphological rules or grammatical patterns.
引用
收藏
页码:835 / 841
页数:7
相关论文
共 50 条
  • [1] A New and Efficient Stemming Technique for Arabic Text Categorization
    Hadni, M.
    Lachkar, A.
    Alaoui Ouatik, S.
    [J]. 2012 INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2012, : 791 - 796
  • [2] Stemming Impact on Arabic Text Categorization Performance: a Survey
    Al-Anzi, Fawaz S.
    AbuZeina, Dia
    [J]. 2015 5TH INTERNATIONAL CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGY AND ACCESSIBILITY (ICTA), 2015,
  • [3] Stemming versus light stemming as feature selection techniques for Arabic text categorization
    Duwairi, Rehab
    Al-Refai, Mohammad
    Khasawneh, Natheer
    [J]. 2007 INNOVATIONS IN INFORMATION TECHNOLOGIES, VOLS 1 AND 2, 2007, : 199 - 203
  • [4] An Improved Parallel Algorithm for Text Categorization
    Yang, Wenchuan
    Fu, Yimin
    Zhang, Dong
    [J]. 2016 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C), 2016, : 451 - 454
  • [5] Contextual entropy and text categorization
    Garcia, Moises
    Hidalgo, Hugo
    Chavez, Edgar
    [J]. LA-WEB 06: FOURTH LATIN AMERICAN WEB CONGRESS, PROCEEDINGS, 2006, : 147 - +
  • [6] Stemming Malay Text and Its Application in Automatic Text Categorization
    Yasukawa, Michiko
    Lim, Hui Tian
    Yokoo, Hidetoshi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (12): : 2351 - 2359
  • [7] An improved text categorization algorithm based on VSM
    Geng, Ji
    Lu, Yunling
    Chen, Wei
    Qin, Zhiguang
    [J]. 2014 IEEE 17TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, : 1701 - 1706
  • [8] Machine learning for Arabic text categorization
    Duwairi, Rehab M.
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (08): : 1005 - 1010
  • [9] Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization
    Almuzaini, Huda Abdulrahman
    Azmi, Aqil M.
    [J]. IEEE ACCESS, 2020, 8 : 127913 - 127928
  • [10] Item Categorization Algorithm Based on Improved Text Representation
    Zhenchao, Tu
    Jing, Ma
    [J]. Data Analysis and Knowledge Discovery, 2022, 6 (05) : 34 - 43