Arabic text classification using deep learning models

被引:132
|
作者
Elnagar, Ashraf [1 ]
Al-Debsi, Ridhwan [1 ]
Einea, Omar [1 ]
机构
[1] Univ Sharjah, Dept Comp Sci, Machine Learning & NLP Res Grp, Sharjah, U Arab Emirates
关键词
Arabic text classification/categorization; Single-label text categorization; Multi-label text categorization; Word embedding; Deep learning; SANAD; NADiA; SENTIMENT ANALYSIS; CATEGORIZATION; IDENTIFICATION; PERFORMANCE; MACHINE; FUTURE;
D O I
10.1016/j.ipm.2019.102121
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text classification or categorization is the process of automatically tagging a textual document with most relevant labels or categories. When the number of labels is restricted to one, the task becomes single-label text categorization. However, the multi-label version is challenging. For Arabic language, both tasks (especially the latter one) become more challenging in the absence of large and free Arabic rich and rational datasets. Therefore, we introduce new rich and unbiased datasets for both the single-label (SANAD) as well as the multi-label (NADiA) Arabic text categorization tasks. Both corpora are made freely available to the research community on Arabic computational linguistics. Further, we present an extensive comparison of several deep learning (DL) models for Arabic text categorization in order to evaluate the effectiveness of such models on SANAD and NADiA. A unique characteristic of our proposed work, when compared to existing ones, is that it does not require a pre-processing phase and fully based on deep learning models. Besides, we studied the impact of utilizing word2vec embedding models to improve the performance of the classification tasks. Our experimental results showed solid performance of all models on SANAD corpus with a minimum accuracy of 91.18%, achieved by convolutional-GRU, and top performance of 96.94%, achieved by attention-GRU. As for NADiA, attention-GRU achieved the highest overall accuracy of 88.68% for a maximum subsets of 10 categories on "Masrawy" dataset.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Arabic Text Classification Using Deep Learning Technics
    Boukil, Samir
    Biniz, Mohamed
    El Adnani, Fatiha
    Cherrat, Loubna
    El Moutaouakkil, Abd Elmaj Id
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2018, 11 (09): : 103 - 114
  • [2] A Deep Learning Approach for Arabic Text Classification
    Sundus, Katrina
    Al-Haj, Fatima
    Hammo, Bassam
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 258 - 264
  • [3] Automatic Arabic Dialect Classification Using Deep Learning Models
    Lulu, Leena
    Elnagar, Ashraf
    [J]. ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 262 - 269
  • [4] A Proposed Deep Learning based Framework for Arabic Text Classification
    Sayed, Mostafa
    Abdelkader, Hatem
    Khedr, Ayman E.
    Salem, Rashed
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 305 - 313
  • [5] Automated Arabic Text Classification Using Hyperparameter Tuned Hybrid Deep Learning Model
    Al-onazi, Badriyya B.
    Alotaib, Saud S.
    Alshahrani, Saeed Masoud
    Alotaibi, Najm
    Alnfiai, Mrim M.
    Salama, Ahmed S.
    Hamza, Manar Ahmed
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 5447 - 5465
  • [6] Effective Deep Learning Models for Automatic Diacritization of Arabic Text
    Madhfar, Mokthar Ali Hasan
    Qamar, Ali Mustafa
    [J]. IEEE ACCESS, 2021, 9 : 273 - 288
  • [7] Arabic text summarization using deep learning approach
    Al-Maleh, Molham
    Desouki, Said
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)
  • [8] Arabic text summarization using deep learning approach
    Molham Al-Maleh
    Said Desouki
    [J]. Journal of Big Data, 7
  • [9] Deep Neural Network Models for Paraphrased Text Classification in the Arabic Language
    Mahmoud, Adnen
    Zrigui, Mounir
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2019), 2019, 11608 : 3 - 16
  • [10] Utilizing Deep Learning in Arabic Text Classification Sentiment Analysis of Twitter
    Ibrahim, Nehad M.
    Yafooz, Wael M. S.
    Emara, Abdel-Hamid M.
    Abdel-Wahab, Ahmed
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 830 - 838