Arabic Spam Tweets Classification: A Comprehensive Machine Learning Approach

被引:0
|
作者
Hantom, Wafa Hussain [1 ]
Rahman, Atta [1 ]
机构
[1] Imam Abdulrahman Bin Faisal Univ IAU, Coll Comp Sci & Informat Technol CCSIT, Dept Comp Sci CS, POB 1982, Dammam 31441, Saudi Arabia
关键词
Arabic natural language processing; deep learning; cybersecurity; random forest; tweet spam detection; LSTM;
D O I
10.3390/ai5030052
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, one of the most common problems faced by Twitter (also known as X) users, including individuals as well as organizations, is dealing with spam tweets. The problem continues to proliferate due to the increasing popularity and number of users of social media platforms. Due to this overwhelming interest, spammers can post texts, images, and videos containing suspicious links that can be used to spread viruses, rumors, negative marketing, and sarcasm, and potentially hack the user's information. Spam detection is among the hottest research areas in natural language processing (NLP) and cybersecurity. Several studies have been conducted in this regard, but they mainly focus on the English language. However, Arabic tweet spam detection still has a long way to go, especially emphasizing the diverse dialects other than modern standard Arabic (MSA), since, in the tweets, the standard dialect is seldom used. The situation demands an automated, robust, and efficient Arabic spam tweet detection approach. To address the issue, in this research, various machine learning and deep learning models have been investigated to detect spam tweets in Arabic, including Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB) and Long-Short Term Memory (LSTM). In this regard, we have focused on the words as well as the meaning of the tweet text. Upon several experiments, the proposed models have produced promising results in contrast to the previous approaches for the same and diverse datasets. The results showed that the RF classifier achieved 96.78% and the LSTM classifier achieved 94.56%, followed by the SVM classifier that achieved 82% accuracy. Further, in terms of F1-score, there is an improvement of 21.38%, 19.16% and 5.2% using RF, LSTM and SVM classifiers compared to the schemes with same dataset.
引用
收藏
页码:1049 / 1065
页数:17
相关论文
共 50 条
  • [1] Arabic spam tweets classification using deep learning
    Sanaa Kaddoura
    Suja A. Alex
    Maher Itani
    Safaa Henno
    Asma AlNashash
    D. Jude Hemanth
    Neural Computing and Applications, 2023, 35 : 17233 - 17246
  • [2] Arabic spam tweets classification using deep learning
    Kaddoura, Sanaa
    Alex, Suja A.
    Itani, Maher
    Henno, Safaa
    AlNashash, Asma
    Hemanth, D. Jude
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (23): : 17233 - 17246
  • [3] A Machine Learning Approach to Cyberbullying Detection in Arabic Tweets
    Musleh, Dhiaa
    Rahman, Atta
    Alkherallah, Mohammed Abbas
    Al-Bohassan, Menhal Kamel
    Alawami, Mustafa Mohammed
    Alsebaa, Hayder Ali
    Alnemer, Jawad Ali
    Al-Mutairi, Ghazi Fayez
    Aldossary, May Issa
    Aldowaihi, Dalal A.
    Alhaidari, Fahd
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (01): : 1033 - 1054
  • [4] Dataset of Arabic spam and ham tweets
    Kaddoura, Sanaa
    Henno, Safaa
    DATA IN BRIEF, 2024, 52
  • [5] A Supervised Machine Learning Approach for Events Extraction out of Arabic Tweets
    Smadi, Mohammad
    Qawasmeh, Omar
    2018 FIFTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2018, : 114 - 119
  • [6] A Comprehensive Review on Email Spam Classification using Machine Learning Algorithms
    Raza, Mansoor
    Jayasinghe, Nathali Dilshani
    Muslam, Muhana Magboul Ali
    35TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2021), 2021, : 327 - 332
  • [7] Comprehensive Literature Review on Machine Learning structures for Web Spam Classification
    Goh, Kwang Leng
    Singh, Ashutosh Kumar
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS, 2015, 70 : 434 - 441
  • [8] Detecting Spam Tweets Using Machine Learning and Effective Preprocessing
    Kardas, Berk
    Bayar, Ismail Erdem
    Ozyer, Tansel
    Alhajj, Reda
    PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2021, 2021, : 393 - 398
  • [9] A machine learning-based approach for sentiment analysis on distance learning from Arabic Tweets
    Almalki, Jameel
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [10] A machine learning-based approach for sentiment analysis on distance learning from Arabic Tweets
    Almalki J.
    PeerJ Computer Science, 2022, 8