Arabic Text Classification: New study

被引:0
|
作者
Ayed, Rabii [1 ]
Labidi, Mohamed [2 ]
Maraoui, Mohsen [3 ]
机构
[1] ISG Sousse, Computat Math Lab, Monastir, Tunisia
[2] ISITCom Hammam Sousse, LaTICE Lab, Monastir, Tunisia
[3] Fac Sci, Computat Math Lab, Monastir, Tunisia
关键词
Arabic text classification; Natural Language Processing; classification algorithms; Artificial Intelligence;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Text classification performance is considerably influenced by a factor selected from the text and presented to the classification algorithm: the feature type. Character N-grams, word roots, word stems, and full words have been altogether used as features for Arabic text classification. No prior studies, as shown in a survey of current literature, have been conducted on the effect of using root N-grams and stem N-grams (N consecutive roots or stems) on Arabic Text classification performance. Consequently, we conducted 108 experiments. For these, three Feature types (1-grams, 2-grams, and 3-grams) of roots, stems and full words were used. For feature selection method, chi square was employed with three thresholds for numbers of features (100, 500, and 1000). As a representation schema, term frequency-inversed document frequency was utilized. Three classifiers were brought to action alongside; Naive Bayes, K-Nearest Neighbor, and Support Vector Machine. Results show that, compared to stem or word N-grams, the use of root 1-grams as a feature provides greater classification performance for Arabic text classification. It was made manifest, as well, that classification performance decreases whenever the number of N-grams increases. The data exhibit, also, that the support vector machine outperforms Naive Bayes and k-nearest neighbor with 1-grams. Whenever the K-Nearest Neighbor was used, however, Root 2-grams achieved the best performance. Root 3-grams, on the other hand, achieved the best performance whenever the Support Vector Machine was used.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Machine learning algorithms in Arabic Text Classification: A Review
    Aboalnaser, Sara A.
    [J]. 12TH INTERNATIONAL CONFERENCE ON THE DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2019), 2019, : 290 - 295
  • [42] Subsequence Kernels-Based Arabic Text Classification
    Nehar, Attia
    Benmessaoud, Abdelkader
    Cherroun, Hadda
    Ziadi, Djelloul
    [J]. 2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, : 206 - 213
  • [43] Arabic Text Mining Using Rule Based Classification
    Thabtah, Fadi
    Gharaibeh, Omar
    Al-Zubaidy, Rashid
    [J]. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2012, 11 (01)
  • [44] Short text classification for Arabic social media tweets
    Alzanin, Samah M.
    Azmi, Aqil M.
    Aboalsamh, Hatim A.
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (09) : 6595 - 6604
  • [45] Multi-label arabic text classification: an overview
    Aljedani, Nawal
    Alotaibi, Reem
    Taileb, Mounira
    [J]. International Journal of Advanced Computer Science and Applications, 2020, 11 (10): : 694 - 706
  • [46] Rational kernels for Arabic Root Extraction and Text Classification
    Nehar, Attia
    Ziadi, Djelloul
    Cherroun, Hadda
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2016, 28 (02) : 157 - 169
  • [47] Multi-Label Arabic Text Classification: An Overview
    Aljedani, Nawal
    Alotaibi, Reem
    Taileb, Mounira
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (10) : 694 - 706
  • [48] The Effect of using Light Stemming for Arabic Text Classification
    Atwan, Jaffar
    Wedyan, Mohammad
    Bsoul, Qusay
    Hamadeen, Ahmad
    Alturki, Ryan
    Ikram, Mohammed
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (05) : 768 - 773
  • [49] Arabic Text Classification Using Deep Learning Technics
    Boukil, Samir
    Biniz, Mohamed
    El Adnani, Fatiha
    Cherrat, Loubna
    El Moutaouakkil, Abd Elmaj Id
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2018, 11 (09): : 103 - 114
  • [50] BERT Models for Arabic Text Classification: A Systematic Review
    Alammary, Ali Saleh
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (11):