An Ensemble Multi-label Themes-Based Classification for Holy Qur'an Verses Using Word2Vec Embedding

被引:4
|
作者
Mohamed, Ensaf Hussein [1 ]
El-Behaidy, Wessam H. [1 ]
机构
[1] Helwan Univ, Fac Comp & Artificial Intelligence, Cairo, Egypt
关键词
Multi-label classification; Holy Quran; Arabic NLP; Machine learning; Word2vec; TF-IDF;
D O I
10.1007/s13369-020-05184-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automatic themes-based classification of Quran verses is the process of classifying verses to predefined categorizes or themes. It is an essential task for all Muslims and people interested in studying the Quran. Quran themes-based classification could be used in many natural language processing (NLP) fields such as search engines, data mining, question-answering systems, and information retrieval applications. This paper presents an ensemble multi-label classification model that automatically identifies and classifies the Quran verses based on themes/topics. The model is composed of four phases: pre-processing, data vectorization, binary relevance classifier, and voting module. Firstly, the verses of the second chapter of the Quran (Al-Baqarah) are tokenized and normalized. Then, the topics of these verses are manually labeled based on "Mushaf Al-Tajweed" classification. Secondly, verses are converted into features' vectors using term frequency-inverse document frequency (TF-IDF) and word2vec techniques. Word2vec is used to consider the semantic meaning of Quranic words and to improve performance. Also, they are trained on a collected classic Arabic corpus of 200 million words. Then, the binary relevance multi-label classification technique is applied using three different classifiers: logistic regression, support vector machine, and random forest, which categorize verses into 393 topics/tags. Finally, the voting module is applied by picking the tags with the maximum prediction probability among the three classifiers. The results of the three classifiers and the ensemble model are compared against "Mushaf Al-Tajweed." The ensemble model outperforms the three classifiers. Its average hamming loss, recall, precision, and F1-Score are 0.224, 81%, 75%, and 77%, respectively.
引用
收藏
页码:3519 / 3529
页数:11
相关论文
共 50 条
  • [41] Document Classification Using Word2Vec and Chi-square on Apache Spark
    Choi, Mijin
    Jin, Rize
    Chung, Tae-Sun
    ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2017, 421 : 867 - 872
  • [42] Feature Extension for Chinese Short Text Classification Based on LDA and Word2vec
    Sun, Fanke
    Chen, Heping
    PROCEEDINGS OF THE 2018 13TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2018), 2018, : 1189 - 1194
  • [43] Research on Civic Hotline Complaint Text Classification Model Based on word2vec
    Luo, JingYu
    Qiu, Zhao
    Xie, GengQuan
    Feng, Jun
    Hu, JianZheng
    Zhang, XiaWen
    2018 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC 2018), 2018, : 180 - 183
  • [44] Multi-label Bird Classification using an Ensemble Classifier with Simple Features
    Leng, Yi Ren
    Huy Dat Tran
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [45] Hierarchical Multi-label Classification using Fully Associative Ensemble Learning
    Zhang, L.
    Shah, S. K.
    Kakadiaris, I. A.
    PATTERN RECOGNITION, 2017, 70 : 89 - 103
  • [46] Advanced Multi-Label Image Classification Techniques Using Ensemble Methods
    Katona, Tamas
    Toth, Gabor
    Petro, Matyas
    Harangi, Balazs
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (02): : 1281 - 1297
  • [47] Improving Multilabel Classification Performance by Using Ensemble of Multi-label Classifiers
    Tahir, Muhammad Atif
    Kittler, Josef
    Mikolajczyk, Krystian
    Yan, Fei
    MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, 2010, 5997 : 11 - 21
  • [48] Instance-Based Ensemble Pruning via Multi-Label Classification
    Markatopoulou, Fotini
    Tsoumakas, Grigorios
    Vlahavas, Ioannis
    22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 1, 2010,
  • [49] Text Classification Based on a Novel Ensemble Multi-Label Learning Method
    Zhang, Tao
    Wu, Jiansheng
    Hu, Haifeng
    2014 2ND INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2014, : 964 - 968
  • [50] A Multi-Label Classification Method on Chinese Temporal Expressions based on Character Embedding
    Yin, Baosheng
    Jin, Bowen
    2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 51 - 54