An Ensemble Multi-label Themes-Based Classification for Holy Qur'an Verses Using Word2Vec Embedding

被引：4

作者：

Mohamed, Ensaf Hussein ^{[1
]}

El-Behaidy, Wessam H. ^{[1
]}

机构：

[1] Helwan Univ, Fac Comp & Artificial Intelligence, Cairo, Egypt

来源：

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING | 2021年 / 46卷 / 04期

关键词：

Multi-label classification; Holy Quran; Arabic NLP; Machine learning; Word2vec; TF-IDF;

D O I：

10.1007/s13369-020-05184-0

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Automatic themes-based classification of Quran verses is the process of classifying verses to predefined categorizes or themes. It is an essential task for all Muslims and people interested in studying the Quran. Quran themes-based classification could be used in many natural language processing (NLP) fields such as search engines, data mining, question-answering systems, and information retrieval applications. This paper presents an ensemble multi-label classification model that automatically identifies and classifies the Quran verses based on themes/topics. The model is composed of four phases: pre-processing, data vectorization, binary relevance classifier, and voting module. Firstly, the verses of the second chapter of the Quran (Al-Baqarah) are tokenized and normalized. Then, the topics of these verses are manually labeled based on "Mushaf Al-Tajweed" classification. Secondly, verses are converted into features' vectors using term frequency-inverse document frequency (TF-IDF) and word2vec techniques. Word2vec is used to consider the semantic meaning of Quranic words and to improve performance. Also, they are trained on a collected classic Arabic corpus of 200 million words. Then, the binary relevance multi-label classification technique is applied using three different classifiers: logistic regression, support vector machine, and random forest, which categorize verses into 393 topics/tags. Finally, the voting module is applied by picking the tags with the maximum prediction probability among the three classifiers. The results of the three classifiers and the ensemble model are compared against "Mushaf Al-Tajweed." The ensemble model outperforms the three classifiers. Its average hamming loss, recall, precision, and F1-Score are 0.224, 81%, 75%, and 77%, respectively.

引用

页码：3519 / 3529

页数：11

共 50 条

[21] Impact of preprocessing and word embedding on extreme multi-label patent classification tasks
Guik Jung
Junghoon Shin
Sangjun Lee
Applied Intelligence, 2023, 53 : 4047 - 4062
[22] A Multi-label Classification of Disaster-Related Tweets with Enhanced Word Embedding Ensemble Convolutional Neural Network Model
Arathi E.
Sasikala S.
Informatica (Slovenia), 2022, 46 (07): : 131 - 144
[23] Multi-label text classification model based on semantic embedding
Yan Danfeng
Ke Nan
Gu Chao
Cui Jianfei
Ding Yiqi
TheJournalofChinaUniversitiesofPostsandTelecommunications, 2019, 26 (01) : 95 - 104
[24] Multi-label Classification of Small Samples Using an Ensemble Technique
Mahdavi-Shahri, Amirreza
Karimian, Jamil
Javadi, Azadeh
Houshmand, Mahboobeh
26TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2018), 2018, : 1708 - 1713
[25] An Efficient Multi-Label Classification System Using Ensemble of Classifiers
Chandran, Shilpa A.
Panicker, Janu R.
2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 1133 - 1136
[26] Multilabel classification using heterogeneous ensemble of multi-label classifiers
Tahir, Muhammad Atif
Kittler, Josef
Bouridane, Ahmed
PATTERN RECOGNITION LETTERS, 2012, 33 (05) : 513 - 523
[27] Classification Bullying Tweet Using Convolutional Neural Network with Word2vec
Ricko
Sasongko, Priyo Sidik
2021 5TH INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2021), 2021,
[28] Text classification model based on Word2vec and SF-HAN
Li, Zhien
Rao, Zhuyi
PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020), 2020, : 1385 - 1390
[29] Multi-label classification of legal text based on label embedding and capsule network
Chen, Zhe
Li, Shang
Ye, Lin
Zhang, Hongli
APPLIED INTELLIGENCE, 2023, 53 (06) : 6873 - 6886
[30] Multi-label classification of legal text based on label embedding and capsule network
Zhe Chen
Shang Li
Lin Ye
Hongli Zhang
Applied Intelligence, 2023, 53 : 6873 - 6886

← 1 2 3 4 5 →