Arabic Spam Tweets Classification: A Comprehensive Machine Learning Approach

被引：0

作者：

Hantom, Wafa Hussain ^{[1
]}

Rahman, Atta ^{[1
]}

机构：

[1] Imam Abdulrahman Bin Faisal Univ IAU, Coll Comp Sci & Informat Technol CCSIT, Dept Comp Sci CS, POB 1982, Dammam 31441, Saudi Arabia

来源：

AI | 2024年 / 5卷 / 03期

关键词：

Arabic natural language processing; deep learning; cybersecurity; random forest; tweet spam detection; LSTM;

D O I：

10.3390/ai5030052

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Nowadays, one of the most common problems faced by Twitter (also known as X) users, including individuals as well as organizations, is dealing with spam tweets. The problem continues to proliferate due to the increasing popularity and number of users of social media platforms. Due to this overwhelming interest, spammers can post texts, images, and videos containing suspicious links that can be used to spread viruses, rumors, negative marketing, and sarcasm, and potentially hack the user's information. Spam detection is among the hottest research areas in natural language processing (NLP) and cybersecurity. Several studies have been conducted in this regard, but they mainly focus on the English language. However, Arabic tweet spam detection still has a long way to go, especially emphasizing the diverse dialects other than modern standard Arabic (MSA), since, in the tweets, the standard dialect is seldom used. The situation demands an automated, robust, and efficient Arabic spam tweet detection approach. To address the issue, in this research, various machine learning and deep learning models have been investigated to detect spam tweets in Arabic, including Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB) and Long-Short Term Memory (LSTM). In this regard, we have focused on the words as well as the meaning of the tweet text. Upon several experiments, the proposed models have produced promising results in contrast to the previous approaches for the same and diverse datasets. The results showed that the RF classifier achieved 96.78% and the LSTM classifier achieved 94.56%, followed by the SVM classifier that achieved 82% accuracy. Further, in terms of F1-score, there is an improvement of 21.38%, 19.16% and 5.2% using RF, LSTM and SVM classifiers compared to the schemes with same dataset.

引用

页码：1049 / 1065

页数：17

共 50 条

[41] ArCyb: A Robust Machine-Learning Model for Arabic Cyberbullying Tweets in Saudi Arabia
Mursi, Khalid T.
Almalki, Abdulrahman Y.
Alshangiti, Moayad M.
Alsubaei, Faisal S.
Alghamdi, Ahmed A.
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 1059 - 1067
[42] Using Machine Learning Algorithms to Detect Content-based Arabic Web Spam
Wahsheh, Heider
Abu Doush, Iyad
Al-Kabi, Mohammed
Alsmadi, Izzat
Al-Shawakfa, Emad
JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2012, 7 (01): : 14 - 23
[43] Machine Learning Approach for Arabic Handwritten Recognition
Mutawa, A.M.
Allaho, Mohammad Y.
Al-Hajeri, Monirah
Applied Sciences (Switzerland), 2024, 14 (19):
[44] A Deep Learning Approach for Arabic Manuscripts Classification
Al-homed, Lutfieh S.
Jambi, Kamal M.
Al-Barhamtoshy, Hassanin M.
SENSORS, 2023, 23 (19)
[45] A Deep Learning Approach for Arabic Text Classification
Sundus, Katrina
Al-Haj, Fatima
Hammo, Bassam
2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 258 - 264
[46] Sentiment Analysis of Arabic Tweets about Violence Against Women using Machine Learning
Alzyout, Moath
Al Bashabsheh, Emran
Najadat, Hassan
Alaiad, Ahmad
2021 12TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2021, : 171 - 176
[47] Arabic Location Named Entity Recognition for Tweets using a Deep Learning Approach
Alzaidi, Bedour Swayelh
Abushark, Yoosef
Khan, Asif Irshad
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 76 - 83
[48] Emotional Analysis of Arabic Saudi Dialect Tweets Using a Supervised Learning Approach
AlFutamani, Abeer A.
Al-Baity, Heyam H.
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2021, 29 (01): : 89 - 109
[49] CLASSIFICATION OF E-MAIL SPAM WITH SUPERVISED MACHINE LEARNING - NAIVE BAYESIAN CLASSIFICATION
Prasad, J. Phani
Venkatesham, T.
ADVANCES AND APPLICATIONS IN MATHEMATICAL SCIENCES, 2021, 20 (12): : 3087 - 3092
[50] Predictive analytics for spam email classification using machine learning techniques
Kumar P.
International Journal of Computer Applications in Technology, 2020, 64 (03): : 282 - 296

← 1 2 3 4 5 →