Arabic Spam Tweets Classification: A Comprehensive Machine Learning Approach

被引:0
|
作者
Hantom, Wafa Hussain [1 ]
Rahman, Atta [1 ]
机构
[1] Imam Abdulrahman Bin Faisal Univ IAU, Coll Comp Sci & Informat Technol CCSIT, Dept Comp Sci CS, POB 1982, Dammam 31441, Saudi Arabia
关键词
Arabic natural language processing; deep learning; cybersecurity; random forest; tweet spam detection; LSTM;
D O I
10.3390/ai5030052
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, one of the most common problems faced by Twitter (also known as X) users, including individuals as well as organizations, is dealing with spam tweets. The problem continues to proliferate due to the increasing popularity and number of users of social media platforms. Due to this overwhelming interest, spammers can post texts, images, and videos containing suspicious links that can be used to spread viruses, rumors, negative marketing, and sarcasm, and potentially hack the user's information. Spam detection is among the hottest research areas in natural language processing (NLP) and cybersecurity. Several studies have been conducted in this regard, but they mainly focus on the English language. However, Arabic tweet spam detection still has a long way to go, especially emphasizing the diverse dialects other than modern standard Arabic (MSA), since, in the tweets, the standard dialect is seldom used. The situation demands an automated, robust, and efficient Arabic spam tweet detection approach. To address the issue, in this research, various machine learning and deep learning models have been investigated to detect spam tweets in Arabic, including Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB) and Long-Short Term Memory (LSTM). In this regard, we have focused on the words as well as the meaning of the tweet text. Upon several experiments, the proposed models have produced promising results in contrast to the previous approaches for the same and diverse datasets. The results showed that the RF classifier achieved 96.78% and the LSTM classifier achieved 94.56%, followed by the SVM classifier that achieved 82% accuracy. Further, in terms of F1-score, there is an improvement of 21.38%, 19.16% and 5.2% using RF, LSTM and SVM classifiers compared to the schemes with same dataset.
引用
收藏
页码:1049 / 1065
页数:17
相关论文
共 50 条
  • [21] An Ensemble Deep Learning Approach for Emotion Detection in Arabic Tweets
    Mansy, Alaa
    Rady, Sherine
    Gharib, Tarek
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (04) : 980 - 990
  • [22] Emotion analysis of Arabic tweets using deep learning approach
    Baali, Massa
    Ghneim, Nada
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [23] Modified Seagull Optimization With Deep Learning for Affect Classification in Arabic Tweets
    Al-Onazi, Badriyya B.
    Alshamrani, Hassan
    Aldaajeh, Fatimah Okleh
    Aziz, Amira Sayed A.
    Rizwanullah, Mohammed
    IEEE ACCESS, 2023, 11 : 98958 - 98968
  • [24] A Proposed Data Science Approach for Email Spam Classification using Machine Learning Techniques
    Alurkar, Aakash Atul
    Ranade, Sourabh Bharat
    Joshi, Shreeya Vijay
    Ranade, Siddhesh Sanjay
    Sonewar, Piyush A.
    Mahalle, Parikshit N.
    Deshpande, Arvind V.
    2017 JOINT 13TH CTTE AND 10TH CMI CONFERENCE ON INTERNET OF THINGS - BUSINESS MODELS, USERS, AND NETWORKS, 2017,
  • [25] Machine Learning Classification of Tweets for Patient Dialysis Experience
    Leidner, Alexander S.
    Gay, Hawkins
    Ho, Bing
    JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY, 2021, 32 (10): : 310 - 310
  • [26] CLASSIFICATION OF SPAM MAIL UTILIZING MACHINE LEARNING AND DEEP LEARNING TECHNIQUES
    Alshawi, Bandar
    Munshi, Amr
    Alotaibi, Majid
    Alturki, Ryan
    Allheeib, Nasser
    INTERNATIONAL JOURNAL ON INFORMATION TECHNOLOGIES AND SECURITY, 2024, 16 (02): : 71 - 82
  • [27] An Approach to Identify SPAM Tweets Based on Metadata
    Haeusl, Martin
    Forster, Johannes
    Kailer, Daniel
    2015 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT), VOL 3, 2015, : 48 - 51
  • [28] Enhancing Detection of Arabic Social Spam Using Data Augmentation and Machine Learning
    Alkadri, Abdullah M.
    Elkorany, Abeer
    Ahmed, Cherry
    APPLIED SCIENCES-BASEL, 2022, 12 (22):
  • [29] HILATSA: A hybrid Incremental learning approach for Arabic tweets sentiment analysis
    Elshakankery, Kariman
    Ahmed, Mona F.
    EGYPTIAN INFORMATICS JOURNAL, 2019, 20 (03) : 163 - 171
  • [30] Chinese Review Spam Classification Using Machine Learning Method
    Xi, Yahui
    2012 INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING AND COMMUNICATION TECHNOLOGY (ICCECT 2012), 2012, : 669 - 672