An Ensemble Multi-label Themes-Based Classification for Holy Qur'an Verses Using Word2Vec Embedding

被引：4

作者：

Mohamed, Ensaf Hussein ^{[1
]}

El-Behaidy, Wessam H. ^{[1
]}

机构：

[1] Helwan Univ, Fac Comp & Artificial Intelligence, Cairo, Egypt

来源：

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING | 2021年 / 46卷 / 04期

关键词：

Multi-label classification; Holy Quran; Arabic NLP; Machine learning; Word2vec; TF-IDF;

D O I：

10.1007/s13369-020-05184-0

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Automatic themes-based classification of Quran verses is the process of classifying verses to predefined categorizes or themes. It is an essential task for all Muslims and people interested in studying the Quran. Quran themes-based classification could be used in many natural language processing (NLP) fields such as search engines, data mining, question-answering systems, and information retrieval applications. This paper presents an ensemble multi-label classification model that automatically identifies and classifies the Quran verses based on themes/topics. The model is composed of four phases: pre-processing, data vectorization, binary relevance classifier, and voting module. Firstly, the verses of the second chapter of the Quran (Al-Baqarah) are tokenized and normalized. Then, the topics of these verses are manually labeled based on "Mushaf Al-Tajweed" classification. Secondly, verses are converted into features' vectors using term frequency-inverse document frequency (TF-IDF) and word2vec techniques. Word2vec is used to consider the semantic meaning of Quranic words and to improve performance. Also, they are trained on a collected classic Arabic corpus of 200 million words. Then, the binary relevance multi-label classification technique is applied using three different classifiers: logistic regression, support vector machine, and random forest, which categorize verses into 393 topics/tags. Finally, the voting module is applied by picking the tags with the maximum prediction probability among the three classifiers. The results of the three classifiers and the ensemble model are compared against "Mushaf Al-Tajweed." The ensemble model outperforms the three classifiers. Its average hamming loss, recall, precision, and F1-Score are 0.224, 81%, 75%, and 77%, respectively.

引用

页码：3519 / 3529

页数：11

共 50 条

[31] Multi-Label ECG Signal Classification Based on Ensemble Classifier
Sun, Zhanquan
Wang, Chaoli
Zhao, Yangyang
Yan, Chao
IEEE ACCESS, 2020, 8 : 117986 - 117996
[32] Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security
Qiao, Yanchen
Zhang, Weizhe
Du, Xiaojiang
Guizani, Mohsen
ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2022, 22 (01)
[33] Ensemble multi-label classification using closed frequent labelsets and label taxonomies
Ferrandin, Mauri
Cerri, Ricardo
APPLIED SOFT COMPUTING, 2025, 171
[34] Scenario-Based Microservice Retrieval Using Word2Vec
Ma, Shang-Pin
Chuang, Yen
Lan, Ci-Wei
Chen, Hsi-Min
Huang, Chun-Ying
Li, Chia-Yu
2018 IEEE 15TH INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE 2018), 2018, : 239 - 244
[35] Movie Recommendation using Metadata based Word2Vec Algorithm
Yoon, Yeo Chan
Lee, Jun Woo
2018 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON18), 2018, : 33 - 37
[36] FPGA-based Acceleration of Word2vec Using OpenCL
Ono, Taisuke
Shoji, Tomoki
Waidyasooriya, Hasitha Muthumala
Hariyama, Masanori
Aoki, Yuichiro
Kondoh, Yuki
Nakagawa, Yaoko
2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
[37] E-Commerce Fake Reviews Detection Using LSTM with Word2Vec Embedding
Raheem, Mafas
Chong, Yi Chien
Journal of Computing and Information Technology, 2024, 32 (02) : 65 - 80
[38] A deep learning analysis on question classification task using Word2vec representations
Yilmaz, Seyhmus
Toklu, Sinan
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (07): : 2909 - 2928
[39] An ensemble-based approach for multi-view multi-label classification
Gibaja E.L.
Moyano J.M.
Ventura S.
Ventura, Sebastián (sventura@uco.es), 2016, Springer Verlag (05) : 251 - 259
[40] A deep learning analysis on question classification task using Word2vec representations
Seyhmus Yilmaz
Sinan Toklu
Neural Computing and Applications, 2020, 32 : 2909 - 2928

← 1 2 3 4 5 →