Automatic Arabic text categorization: A comprehensive comparative study

被引:45
|
作者
Hmeidi, Ismail [1 ]
Al-Ayyoub, Mahmoud [1 ]
Abdulla, Nawaf A. [1 ]
Almodawar, Abdalrahman A. [1 ]
Abooraig, Raddad [1 ]
Mahyoub, Nizar A. [1 ]
机构
[1] Jordan Univ Sci & Technol, Irbid 22110, Jordan
关键词
Arabic text categorization; classification; decision table; decision tree; K-Nearest Neighbour; light stemming; naive Bayes; RapidMiner; root-based stemming; Support Vector Machine; Weka; HYBRID APPROACH; PERFORMANCE; WORDS;
D O I
10.1177/0165551514558172
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text categorization or classification (TC) is concerned with placing text documents in their proper category according to their contents. Owing to the various applications of TC and the large volume of text documents uploaded on the Internet daily, the need for such an automated method stems from the difficulty and tedium of performing such a process manually. The usefulness of TC is manifested in different fields and needs. For instance, the ability to automatically classify an article or an email into its right class (Arts, Economics, Politics, Sports, etc.) would be appreciated by individual users as well as companies. This paper is concerned with TC of Arabic articles. It contains a comparison of the five best known algorithms for TC. It also studies the effects of utilizing different Arabic stemmers (light and root-based stemmers) on the effectiveness of these classifiers. Furthermore, a comparison between different data mining software tools (Weka and RapidMiner) is presented. The results illustrate the good accuracy provided by the SVM classifier, especially when used with the light10 stemmer. This outcome can be used in future as a baseline to compare with other unexplored classifiers and Arabic stemmers.
引用
收藏
页码:114 / 124
页数:11
相关论文
共 50 条
  • [31] Stemming Impact on Arabic Text Categorization Performance: a Survey
    Al-Anzi, Fawaz S.
    AbuZeina, Dia
    [J]. 2015 5TH INTERNATIONAL CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGY AND ACCESSIBILITY (ICTA), 2015,
  • [32] Improving Arabic Text Categorization using Decision Trees
    Harrag, Fouzi
    El-Qawasmeh, Eyas
    Pichappan, Pit
    [J]. NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 110 - +
  • [33] A New and Efficient Stemming Technique for Arabic Text Categorization
    Hadni, M.
    Lachkar, A.
    Alaoui Ouatik, S.
    [J]. 2012 INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2012, : 791 - 796
  • [34] A Superior Arabic Text Categorization Deep Model (SATCDM)
    Alhawarat, M.
    Aseeri, Ahmad O.
    [J]. IEEE ACCESS, 2020, 8 : 24653 - 24661
  • [35] Arabic Text Categorization using Machine Learning Approaches
    Alshammari, Riyad
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (03) : 226 - 230
  • [36] A distance-based classifier for arabic text categorization
    Duwairi, RM
    [J]. DMIN '05: Proceedings of the 2005 International Conference on Data Mining, 2005, : 187 - 192
  • [37] Deep Neural Models and Retrofitting for Arabic Text Categorization
    El-Alami, Fatima-Zahra
    El Alaoui, Said Ouatik
    En-Nahnahi, Noureddine
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2020, 16 (02) : 74 - 86
  • [38] The Automated Arabic Text Categorization Using SVM and KNN
    Hadi, Wa'el Musa
    Eljinini, Mohammad Ali H.
    Alhawari, Samer
    [J]. KNOWLEDGE MANAGEMENT AND INNOVATION: A BUSINESS COMPETITIVE EDGE PERSPECTIVE, VOLS 1-3, 2010, : 757 - +
  • [39] Automatic Arabic text summarization: a survey
    Asma Bader Al-Saleh
    Mohamed El Bachir Menai
    [J]. Artificial Intelligence Review, 2016, 45 : 203 - 234
  • [40] Automatic Arabic text summarization: a survey
    Al-Saleh, Asma Bader
    Menai, Mohamed El Bachir
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2016, 45 (02) : 203 - 234