An Ensemble Multi-label Themes-Based Classification for Holy Qur'an Verses Using Word2Vec Embedding

被引:4
|
作者
Mohamed, Ensaf Hussein [1 ]
El-Behaidy, Wessam H. [1 ]
机构
[1] Helwan Univ, Fac Comp & Artificial Intelligence, Cairo, Egypt
关键词
Multi-label classification; Holy Quran; Arabic NLP; Machine learning; Word2vec; TF-IDF;
D O I
10.1007/s13369-020-05184-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automatic themes-based classification of Quran verses is the process of classifying verses to predefined categorizes or themes. It is an essential task for all Muslims and people interested in studying the Quran. Quran themes-based classification could be used in many natural language processing (NLP) fields such as search engines, data mining, question-answering systems, and information retrieval applications. This paper presents an ensemble multi-label classification model that automatically identifies and classifies the Quran verses based on themes/topics. The model is composed of four phases: pre-processing, data vectorization, binary relevance classifier, and voting module. Firstly, the verses of the second chapter of the Quran (Al-Baqarah) are tokenized and normalized. Then, the topics of these verses are manually labeled based on "Mushaf Al-Tajweed" classification. Secondly, verses are converted into features' vectors using term frequency-inverse document frequency (TF-IDF) and word2vec techniques. Word2vec is used to consider the semantic meaning of Quranic words and to improve performance. Also, they are trained on a collected classic Arabic corpus of 200 million words. Then, the binary relevance multi-label classification technique is applied using three different classifiers: logistic regression, support vector machine, and random forest, which categorize verses into 393 topics/tags. Finally, the voting module is applied by picking the tags with the maximum prediction probability among the three classifiers. The results of the three classifiers and the ensemble model are compared against "Mushaf Al-Tajweed." The ensemble model outperforms the three classifiers. Its average hamming loss, recall, precision, and F1-Score are 0.224, 81%, 75%, and 77%, respectively.
引用
收藏
页码:3519 / 3529
页数:11
相关论文
共 50 条
  • [31] Multi-Label ECG Signal Classification Based on Ensemble Classifier
    Sun, Zhanquan
    Wang, Chaoli
    Zhao, Yangyang
    Yan, Chao
    IEEE ACCESS, 2020, 8 : 117986 - 117996
  • [32] Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security
    Qiao, Yanchen
    Zhang, Weizhe
    Du, Xiaojiang
    Guizani, Mohsen
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2022, 22 (01)
  • [33] Ensemble multi-label classification using closed frequent labelsets and label taxonomies
    Ferrandin, Mauri
    Cerri, Ricardo
    APPLIED SOFT COMPUTING, 2025, 171
  • [34] Scenario-Based Microservice Retrieval Using Word2Vec
    Ma, Shang-Pin
    Chuang, Yen
    Lan, Ci-Wei
    Chen, Hsi-Min
    Huang, Chun-Ying
    Li, Chia-Yu
    2018 IEEE 15TH INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE 2018), 2018, : 239 - 244
  • [35] Movie Recommendation using Metadata based Word2Vec Algorithm
    Yoon, Yeo Chan
    Lee, Jun Woo
    2018 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON18), 2018, : 33 - 37
  • [36] FPGA-based Acceleration of Word2vec Using OpenCL
    Ono, Taisuke
    Shoji, Tomoki
    Waidyasooriya, Hasitha Muthumala
    Hariyama, Masanori
    Aoki, Yuichiro
    Kondoh, Yuki
    Nakagawa, Yaoko
    2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
  • [37] E-Commerce Fake Reviews Detection Using LSTM with Word2Vec Embedding
    Raheem, Mafas
    Chong, Yi Chien
    Journal of Computing and Information Technology, 2024, 32 (02) : 65 - 80
  • [38] A deep learning analysis on question classification task using Word2vec representations
    Yilmaz, Seyhmus
    Toklu, Sinan
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (07): : 2909 - 2928
  • [39] An ensemble-based approach for multi-view multi-label classification
    Gibaja E.L.
    Moyano J.M.
    Ventura S.
    Ventura, Sebastián (sventura@uco.es), 2016, Springer Verlag (05) : 251 - 259
  • [40] A deep learning analysis on question classification task using Word2vec representations
    Seyhmus Yilmaz
    Sinan Toklu
    Neural Computing and Applications, 2020, 32 : 2909 - 2928