Arabic Text Classification: New study

被引:0
|
作者
Ayed, Rabii [1 ]
Labidi, Mohamed [2 ]
Maraoui, Mohsen [3 ]
机构
[1] ISG Sousse, Computat Math Lab, Monastir, Tunisia
[2] ISITCom Hammam Sousse, LaTICE Lab, Monastir, Tunisia
[3] Fac Sci, Computat Math Lab, Monastir, Tunisia
关键词
Arabic text classification; Natural Language Processing; classification algorithms; Artificial Intelligence;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Text classification performance is considerably influenced by a factor selected from the text and presented to the classification algorithm: the feature type. Character N-grams, word roots, word stems, and full words have been altogether used as features for Arabic text classification. No prior studies, as shown in a survey of current literature, have been conducted on the effect of using root N-grams and stem N-grams (N consecutive roots or stems) on Arabic Text classification performance. Consequently, we conducted 108 experiments. For these, three Feature types (1-grams, 2-grams, and 3-grams) of roots, stems and full words were used. For feature selection method, chi square was employed with three thresholds for numbers of features (100, 500, and 1000). As a representation schema, term frequency-inversed document frequency was utilized. Three classifiers were brought to action alongside; Naive Bayes, K-Nearest Neighbor, and Support Vector Machine. Results show that, compared to stem or word N-grams, the use of root 1-grams as a feature provides greater classification performance for Arabic text classification. It was made manifest, as well, that classification performance decreases whenever the number of N-grams increases. The data exhibit, also, that the support vector machine outperforms Naive Bayes and k-nearest neighbor with 1-grams. Whenever the K-Nearest Neighbor was used, however, Root 2-grams achieved the best performance. Root 3-grams, on the other hand, achieved the best performance whenever the Support Vector Machine was used.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Arabic text classification using Polynomial Networks
    Al-Tahrawi, Mayy M.
    Al-Khatib, Sumaya N.
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2015, 27 (04) : 437 - 449
  • [22] Compression-Based Arabic Text Classification
    Ta'amneh, Haneen
    Abu Keshek, Ehsan
    Issa, Manar Bani
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    [J]. 2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, : 594 - 600
  • [23] The impact of indexing approaches on Arabic text classification
    Al-Badarneh, Amer
    Al-Shawakfa, Emad
    Bani-Ismail, Basel
    Al-Rababah, Khaleel
    Shatnawi, Safwan
    [J]. JOURNAL OF INFORMATION SCIENCE, 2017, 43 (02) : 159 - 173
  • [24] Effect of Word Segmentation on Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Al-Subaie, Abdullah
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 127 - 131
  • [25] Arabic Text Classification based on Semantic Relations
    Hijazi, Musab
    Zeki, Akram
    Ismail, Amelia
    [J]. INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE, 2022, 17 (02): : 937 - 946
  • [26] A Deep Learning Approach for Arabic Text Classification
    Sundus, Katrina
    Al-Haj, Fatima
    Hammo, Bassam
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 258 - 264
  • [27] Evaluating Various Tokenizers for Arabic Text Classification
    Zaid Alyafeai
    Maged S. Al-shaibani
    Mustafa Ghaleb
    Irfan Ahmad
    [J]. Neural Processing Letters, 2023, 55 : 2911 - 2933
  • [28] Arabic text classification based on analogical proportions
    Bounhas, Myriam
    Elayeb, Bilel
    Chouigui, Amina
    Hussain, Amir
    Cambria, Erik
    [J]. EXPERT SYSTEMS, 2024,
  • [29] Evaluating Various Tokenizers for Arabic Text Classification
    Alyafeai, Zaid
    Al-shaibani, Maged S.
    Ghaleb, Mustafa
    Ahmad, Irfan
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (03) : 2911 - 2933
  • [30] Named entity recognition and classification for text in arabic
    Abuleil, S
    Evens, M
    [J]. INTELLIGENT AND ADAPTIVE SYSTEMS AND SOFTWARE ENGINEERING, 2004, : 89 - 94