Arabic Text Classification: New study

被引:0
|
作者
Ayed, Rabii [1 ]
Labidi, Mohamed [2 ]
Maraoui, Mohsen [3 ]
机构
[1] ISG Sousse, Computat Math Lab, Monastir, Tunisia
[2] ISITCom Hammam Sousse, LaTICE Lab, Monastir, Tunisia
[3] Fac Sci, Computat Math Lab, Monastir, Tunisia
关键词
Arabic text classification; Natural Language Processing; classification algorithms; Artificial Intelligence;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Text classification performance is considerably influenced by a factor selected from the text and presented to the classification algorithm: the feature type. Character N-grams, word roots, word stems, and full words have been altogether used as features for Arabic text classification. No prior studies, as shown in a survey of current literature, have been conducted on the effect of using root N-grams and stem N-grams (N consecutive roots or stems) on Arabic Text classification performance. Consequently, we conducted 108 experiments. For these, three Feature types (1-grams, 2-grams, and 3-grams) of roots, stems and full words were used. For feature selection method, chi square was employed with three thresholds for numbers of features (100, 500, and 1000). As a representation schema, term frequency-inversed document frequency was utilized. Three classifiers were brought to action alongside; Naive Bayes, K-Nearest Neighbor, and Support Vector Machine. Results show that, compared to stem or word N-grams, the use of root 1-grams as a feature provides greater classification performance for Arabic text classification. It was made manifest, as well, that classification performance decreases whenever the number of N-grams increases. The data exhibit, also, that the support vector machine outperforms Naive Bayes and k-nearest neighbor with 1-grams. Whenever the K-Nearest Neighbor was used, however, Root 2-grams achieved the best performance. Root 3-grams, on the other hand, achieved the best performance whenever the Support Vector Machine was used.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] NADA: New Arabic Dataset for Text Classification
    Alalyani, Nada
    Marie-Sainte, Souad Larabi
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (09) : 206 - 212
  • [2] An Experimental Study for Arabic Text Classification Techniques
    Al-Shargabi, Bassam
    Olayah, Fekry
    [J]. FOURTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2012), 2012, 8334
  • [3] The Effect of Stemming on Arabic Text Classification: An Empirical Study
    Wahbeh, Abdullah
    Al-Kabi, Mohammed
    Al-Radaideh, Qasem
    Al-Shawakfa, Emad
    Alsmadi, Izzat
    [J]. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2011, 1 (03) : 54 - 70
  • [4] Classification of Cyberbullying Text in Arabic
    Rachid, Benaissa Azzeddine
    Azza, Harbaoui
    Ben Ghezala, Hajjami Henda
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [5] A New Approach for Arabic Text Classification Using Arabic Field-Association Terms
    Atlam, El-Sayed
    Morita, Kazuhiro
    Fuketa, Masao
    Aoe, Jun-ichi
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (11): : 2266 - 2276
  • [6] Utilizing arabic wordnet relations in arabic text classification: New feature selection methods
    Yousif, Suhad A.
    Sultani, Zainab N.
    Samawi, Venus W.
    [J]. IAENG International Journal of Computer Science, 2019, 46 (04) : 1 - 12
  • [7] Arabic Text Classification: A Review Study on Feature Selection Methods
    Hijazi, Musab Mustafa
    Zeki, Akram
    Ismail, Amelia
    [J]. 2021 22ND INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2021, : 554 - 559
  • [8] Accuracy Evaluation of Arabic Text Classification
    Sayed, Mostafa
    Salem, Rashed
    Khedr, Ayman E.
    [J]. 2017 12TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2017, : 365 - 370
  • [9] Arabic Text Classification in the Legal Domain
    Ait Yahia, Ikram
    Loqman, Chakir
    [J]. 2019 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS 2019), 2019,
  • [10] A Closer Look at Arabic Text Classification
    Abdeen, Mohammad A. R.
    AlBouq, Sami
    Elmahalawy, Ahmed
    Shehata, Sara
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (11) : 677 - 688