Effect of Word Segmentation on Arabic Text Classification

被引:0
|
作者
Al-Thubaity, Abdulmohsen [1 ]
Al-Subaie, Abdullah [1 ]
机构
[1] King Abdulaziz City Sci & Technol, Natl Ctr Comp Technol & Appl Math, Riyadh, Saudi Arabia
关键词
Arabic text classification; text preprocessing; classification performance;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The preprocessing stage in text classification is one of the factors affecting the accuracy of text classification. Text preprocessing involves several steps such as removing stop words, punctuation, and numerals. For Arabic text classification, stemming and root extraction were proposed as additional preprocessing steps. The resulting stems and roots are then used as features for Arabic text classification. In this study, we propose word segmentation as an additional preprocessing step. We used a dataset comprising 4,900 newspaper articles evenly distributed into seven classes. We conducted our experiments on segmented and nonsegmented versions of this dataset. We used chi-squared to select top-ranked features, LTC as a representation schema, and SVM as a classifier. By measuring the accuracy, precision, recall, and F - measure, we evaluated the use of word orthography as a feature for Arabic text classification before and after segmentation. In all of the experiments we conducted, the classification performance for the segmented dataset outperformed the nonsegmented dataset with the same number of features. Furthermore, we can attain the same classification performance with nonsegmented datasets using fewer features.
引用
收藏
页码:127 / 131
页数:5
相关论文
共 50 条
  • [31] Text classification and gradation in Arabic textbooks
    Mohamed, Salwa
    [J]. LANGUAGE LEARNING JOURNAL, 2023,
  • [32] Arabic Text Classification: New study
    Ayed, Rabii
    Labidi, Mohamed
    Maraoui, Mohsen
    [J]. 2017 INTERNATIONAL CONFERENCE ON ENGINEERING & MIS (ICEMIS), 2017,
  • [33] Text Line segmentation of historical Arabic documents
    Zahour, Abderrazak
    Likforman-Sulem, Laurence
    Boussalaa, Wafa
    Taconet, Bruno
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 138 - +
  • [34] The Effect of Combining Different Feature Selection Methods on Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Abanumay, Norah
    AL-Jerayyed, Sara
    Alrukban, Aljoharah
    Mannaa, Zarah
    [J]. 2013 14TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD 2013), 2013, : 211 - 216
  • [35] A segmentation-free approach to text recognition with application to Arabic text
    Al-Badr B.
    Haralick R.M.
    [J]. International Journal on Document Analysis and Recognition, 1998, 1 (3) : 147 - 166
  • [36] A Comparison of Text-Classification Techniques Applied to Arabic Text
    Kanaan, Ghassan
    Al-Shalabi, Riyad
    Ghwanmeh, Sameh
    Al-Ma'adeed, Hamda
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (09): : 1836 - 1844
  • [37] Comparison of Pre-trained Word Vectors for Arabic Text Classification using Deep Learning Approach
    Alwehaibi, Ali
    Roy, Kaushik
    [J]. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 1471 - 1474
  • [38] Analogical Text Mining: Application to Arabic Text Summarization and Classification
    Elayeb, Bilel
    Chouigui, Amina
    Bounhas, Myriam
    [J]. 2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [39] An Experimental Study for Arabic Text Classification Techniques
    Al-Shargabi, Bassam
    Olayah, Fekry
    [J]. FOURTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2012), 2012, 8334
  • [40] Arabic text classification using Polynomial Networks
    Al-Tahrawi, Mayy M.
    Al-Khatib, Sumaya N.
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2015, 27 (04) : 437 - 449