Arabic Spam Tweets Classification: A Comprehensive Machine Learning Approach

被引:0
|
作者
Hantom, Wafa Hussain [1 ]
Rahman, Atta [1 ]
机构
[1] Imam Abdulrahman Bin Faisal Univ IAU, Coll Comp Sci & Informat Technol CCSIT, Dept Comp Sci CS, POB 1982, Dammam 31441, Saudi Arabia
关键词
Arabic natural language processing; deep learning; cybersecurity; random forest; tweet spam detection; LSTM;
D O I
10.3390/ai5030052
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, one of the most common problems faced by Twitter (also known as X) users, including individuals as well as organizations, is dealing with spam tweets. The problem continues to proliferate due to the increasing popularity and number of users of social media platforms. Due to this overwhelming interest, spammers can post texts, images, and videos containing suspicious links that can be used to spread viruses, rumors, negative marketing, and sarcasm, and potentially hack the user's information. Spam detection is among the hottest research areas in natural language processing (NLP) and cybersecurity. Several studies have been conducted in this regard, but they mainly focus on the English language. However, Arabic tweet spam detection still has a long way to go, especially emphasizing the diverse dialects other than modern standard Arabic (MSA), since, in the tweets, the standard dialect is seldom used. The situation demands an automated, robust, and efficient Arabic spam tweet detection approach. To address the issue, in this research, various machine learning and deep learning models have been investigated to detect spam tweets in Arabic, including Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB) and Long-Short Term Memory (LSTM). In this regard, we have focused on the words as well as the meaning of the tweet text. Upon several experiments, the proposed models have produced promising results in contrast to the previous approaches for the same and diverse datasets. The results showed that the RF classifier achieved 96.78% and the LSTM classifier achieved 94.56%, followed by the SVM classifier that achieved 82% accuracy. Further, in terms of F1-score, there is an improvement of 21.38%, 19.16% and 5.2% using RF, LSTM and SVM classifiers compared to the schemes with same dataset.
引用
收藏
页码:1049 / 1065
页数:17
相关论文
共 50 条
  • [31] A Deep Learning Approach to Classify and Quantify the Multiple Emotions of Arabic Tweets
    Abdullah, Faisal
    Al-Ayyoub, Mahmoud
    Hmeidi, Ismail
    Alhindaw, Nouh
    2021 12TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2021, : 399 - 404
  • [32] An optimized deep learning approach for suicide detection through Arabic tweets
    Baghdadi, Nadiah A.
    Malki, Amer
    Balaha, Hossam Magdy
    AbdulAzeem, Yousry
    Badawy, Mahmoud
    Elhosseini, Mostafa
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [33] An optimized deep learning approach for suicide detection through Arabic tweets
    Baghdadi N.A.
    Malki A.
    Balaha H.M.
    AbdulAzeem Y.
    Badawy M.
    Elhosseini M.
    PeerJ Comput. Sci., 2022,
  • [34] Classification of Spam Mail using different machine learning algorithms
    Shrivastava, Aditya
    Dubey, Rachana
    2018 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATION AND TELECOMMUNICATION (ICACAT), 2018,
  • [35] Ensemble Machine Learning Model for Classification of Spam Product Reviews
    Fayaz, Muhammad
    Khan, Atif
    Rahman, Javid Ur
    Alharbi, Abdullah
    Uddin, M. Irfan
    Alouffi, Bader
    COMPLEXITY, 2020, 2020
  • [36] A spam filter approach with the improved machine learning technology
    Pang, Xiu-Li
    Feng, Yu-Qiang
    Jiang, Wei
    ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 2, PROCEEDINGS, 2007, : 484 - +
  • [37] A Machine Learning based Web Spam Filtering Approach
    Kumar, Santosh
    Gao, Xiaoying
    Welch, Ian
    Mansoori, Masood
    IEEE 30TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS IEEE AINA 2016, 2016, : 973 - 980
  • [38] Sentiment Analysis of Tweets using Machine Learning Approach
    Rathi, Megha
    Malik, Aditya
    Varshney, Daksh
    Sharma, Rachita
    Mendiratta, Sarthak
    2018 ELEVENTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2018, : 365 - 367
  • [39] An improved approach to Arabic news classification based on hyperparameter tuning of machine learning algorithms
    Jamaleddyn, Imad
    El Ayachi, Rachid
    Biniz, Mohamed
    JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):
  • [40] Machine learning algorithms in Arabic Text Classification: A Review
    Aboalnaser, Sara A.
    12TH INTERNATIONAL CONFERENCE ON THE DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2019), 2019, : 290 - 295