Effect of Word Segmentation on Arabic Text Classification

被引:0
|
作者
Al-Thubaity, Abdulmohsen [1 ]
Al-Subaie, Abdullah [1 ]
机构
[1] King Abdulaziz City Sci & Technol, Natl Ctr Comp Technol & Appl Math, Riyadh, Saudi Arabia
关键词
Arabic text classification; text preprocessing; classification performance;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The preprocessing stage in text classification is one of the factors affecting the accuracy of text classification. Text preprocessing involves several steps such as removing stop words, punctuation, and numerals. For Arabic text classification, stemming and root extraction were proposed as additional preprocessing steps. The resulting stems and roots are then used as features for Arabic text classification. In this study, we propose word segmentation as an additional preprocessing step. We used a dataset comprising 4,900 newspaper articles evenly distributed into seven classes. We conducted our experiments on segmented and nonsegmented versions of this dataset. We used chi-squared to select top-ranked features, LTC as a representation schema, and SVM as a classifier. By measuring the accuracy, precision, recall, and F - measure, we evaluated the use of word orthography as a feature for Arabic text classification before and after segmentation. In all of the experiments we conducted, the classification performance for the segmented dataset outperformed the nonsegmented dataset with the same number of features. Furthermore, we can attain the same classification performance with nonsegmented datasets using fewer features.
引用
收藏
页码:127 / 131
页数:5
相关论文
共 50 条
  • [21] Arabic Word Segmentation for Better Unit of Analysis
    Benajiba, Yassine
    Zitouni, Imed
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1346 - 1352
  • [22] Word Segmentation of Informal Arabic with Domain Adaptation
    Monroe, Will
    Green, Spence
    Manning, Christoper D.
    [J]. PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, : 206 - 211
  • [23] Word Segmentation for Arabic Abstractive Headline Generation
    Abdelaziz, Yaser O.
    El-Beltagy, Samhaa R.
    [J]. 2021 4TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT 2021), 2021, : 59 - 63
  • [24] Language model based arabic word segmentation
    Lee, YS
    Papineni, K
    Roukos, S
    Emam, O
    Hassan, H
    [J]. 41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 399 - 406
  • [25] Accuracy Evaluation of Arabic Text Classification
    Sayed, Mostafa
    Salem, Rashed
    Khedr, Ayman E.
    [J]. 2017 12TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2017, : 365 - 370
  • [26] Arabic Text Classification in the Legal Domain
    Ait Yahia, Ikram
    Loqman, Chakir
    [J]. 2019 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS 2019), 2019,
  • [27] A Closer Look at Arabic Text Classification
    Abdeen, Mohammad A. R.
    AlBouq, Sami
    Elmahalawy, Ahmed
    Shehata, Sara
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (11) : 677 - 688
  • [28] A survey of Arabic text classification approaches
    Sayed, Mostafa
    Salem, Rashed K.
    Khder, Ayman E.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2019, 59 (03) : 236 - 251
  • [29] Neural Network for Arabic Text Classification
    Harrag, Fouzi
    El-Qawasmah, Eyas
    [J]. 2009 SECOND INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES (ICADIWT 2009), 2009, : 778 - +
  • [30] Arabic Text Classification: A Literature Review
    Elayeb, Bilel
    [J]. 2021 IEEE/ACS 18TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2021,