A hybrid classification method for Twitter spam detection based on differential evolution and random forest

被引:29
|
作者
Bazzaz Abkenar, Sepideh [1 ]
Mahdipour, Ebrahim [1 ]
Jameii, Seyed Mahdi [2 ]
Haghi Kashani, Mostafa [2 ]
机构
[1] Islamic Azad Univ, Sci & Res Branch, Dept Comp Engn, Tehran, Iran
[2] Islamic Azad Univ, Shahr E Qods Branch, Dept Comp Engn, Tehran, Iran
来源
关键词
imbalanced dataset; machine learning; social networks; spam; Twitter;
D O I
10.1002/cpe.6381
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Social networking services are online platforms that are distributed across different computers over long distances. Twitter is the most popular microblogging site that allows users to share their opinions and real-world events. Due to its popularity and ease of use, Twitter has also attracted spammers. As a result, spam detection is one of the most critical problems. In order to provide a spam-free environment, it is necessary to identify and filter spam tweets as well as their owners. A hybrid method, which is based on Synthetic Minority Over-sampling TEchnique (SMOTE) and Differential Evolution (DE) strategies, is presented to enhance the spam detection rate in real Twitter datasets. SMOTE is applied to tackle the imbalanced class distribution of datasets, while DE is used to tune Random Forest (RF) hyperparameters. Compared with related work and based on evaluation results, the presented method significantly enhances the classification performance in imbalanced datasets. The detection rate of optimized RF with excellent F-1-score and Area Under the Receiver Operating Characteristic Curve (AUROC), which are 98.97% and 0.999, respectively, demonstrates the high efficiency of the proposed method.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] A Hybrid Approach for Spam Detection for Twitter
    Mateen, Malik
    Aleem, Muhammad
    Iqbal, Muhammad Azhar
    Islam, Muhammad Arshad
    PROCEEDINGS OF 2017 14TH INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGY (IBCAST), 2017, : 466 - 471
  • [2] Performance of RUS and SMOTE Method on Twitter Spam Data Using Random Forest
    Ubaya, Huda
    Juairiah, Ria Siti
    3RD FORUM IN RESEARCH, SCIENCE, AND TECHNOLOGY (FIRST 2019) INTERNATIONAL CONFERENCE, 2020, 1500
  • [3] Twitter spam account detection based on clustering and classification methods
    Kayode Sakariyah Adewole
    Tao Han
    Wanqing Wu
    Houbing Song
    Arun Kumar Sangaiah
    The Journal of Supercomputing, 2020, 76 : 4802 - 4837
  • [4] Twitter spam account detection based on clustering and classification methods
    Adewole, Kayode Sakariyah
    Hang, Tao
    Wu, Wanqing
    Songs, Houbing
    Sangaiah, Arun Kumar
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (07): : 4802 - 4837
  • [5] Threshold and Associative Based Classification for Social Spam Profile Detection on Twitter
    Hua, Willian
    Zhang, Yanqing
    2013 NINTH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2013, : 113 - 120
  • [6] Sentiment Based Twitter Spam Detection
    Perveen, Nasira
    Missen, Malik M. Saad
    Rasool, Qaisar
    Akhtar, Nadeem
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (07) : 568 - 573
  • [7] A Systematic Analysis of Random Forest Based Social Media Spam Classification
    Al-Janabi, Mohammed
    Andras, Peter
    NETWORK AND SYSTEM SECURITY, 2017, 10394 : 427 - 438
  • [8] Adaptive Classification for Spam Detection on Twitter with Specific Data
    Dangkesee, Thayakorn
    Puntheeranurak, Sutheera
    2017 21ST INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE (ICSEC 2017), 2017, : 243 - 246
  • [9] Detecting Streaming of Twitter Spam Using Hybrid Method
    Murugan, N. Senthil
    Devi, G. Usha
    WIRELESS PERSONAL COMMUNICATIONS, 2018, 103 (02) : 1353 - 1374
  • [10] Detecting Streaming of Twitter Spam Using Hybrid Method
    N. Senthil Murugan
    G. Usha Devi
    Wireless Personal Communications, 2018, 103 : 1353 - 1374