A hybrid classification method for Twitter spam detection based on differential evolution and random forest

被引:29
|
作者
Bazzaz Abkenar, Sepideh [1 ]
Mahdipour, Ebrahim [1 ]
Jameii, Seyed Mahdi [2 ]
Haghi Kashani, Mostafa [2 ]
机构
[1] Islamic Azad Univ, Sci & Res Branch, Dept Comp Engn, Tehran, Iran
[2] Islamic Azad Univ, Shahr E Qods Branch, Dept Comp Engn, Tehran, Iran
来源
关键词
imbalanced dataset; machine learning; social networks; spam; Twitter;
D O I
10.1002/cpe.6381
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Social networking services are online platforms that are distributed across different computers over long distances. Twitter is the most popular microblogging site that allows users to share their opinions and real-world events. Due to its popularity and ease of use, Twitter has also attracted spammers. As a result, spam detection is one of the most critical problems. In order to provide a spam-free environment, it is necessary to identify and filter spam tweets as well as their owners. A hybrid method, which is based on Synthetic Minority Over-sampling TEchnique (SMOTE) and Differential Evolution (DE) strategies, is presented to enhance the spam detection rate in real Twitter datasets. SMOTE is applied to tackle the imbalanced class distribution of datasets, while DE is used to tune Random Forest (RF) hyperparameters. Compared with related work and based on evaluation results, the presented method significantly enhances the classification performance in imbalanced datasets. The detection rate of optimized RF with excellent F-1-score and Area Under the Receiver Operating Characteristic Curve (AUROC), which are 98.97% and 0.999, respectively, demonstrates the high efficiency of the proposed method.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Random Forest based Traffic Classification Method In SDN
    Zhai, Yubo
    Zheng, Xianghan
    2018 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, BIG DATA AND BLOCKCHAIN (ICCBB 2018), 2018, : 66 - 70
  • [22] Forest resource classification based on random forest and object oriented method
    Wang M.
    Zhang X.
    Wang J.
    Sun Y.
    Jian G.
    Pan C.
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2020, 49 (02): : 235 - 244
  • [23] MACHINE LEARNING BASED TWITTER SPAM ACCOUNT DETECTION: A REVIEW
    Gheewala, Shivangi
    Patel, Rakesh
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2018), 2018, : 79 - 84
  • [24] Islanding Detection for DC Microgrid Based on Random Forest Classification
    Wan, Qingzhu
    Wu, Kaicong
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 127 : 96 - 97
  • [25] Islanding detection for DC microgrid based on random forest classification
    Wan, Qingzhu
    Wu, Kaicong
    Taiyangneng Xuebao/Acta Energiae Solaris Sinica, 2022, 43 (01): : 269 - 276
  • [26] Improved random forest classification approach based on hybrid clustering selection
    Yuan, Dong
    Huang, Jian
    Yang, Xu
    Cui, Jiarui
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 1559 - 1563
  • [27] Vietnamese spam detection based on language classification
    Anh, Nguyen Tuan
    Anh, Tran Quang
    Binh, Nguyen Ngoc
    2008 SECOND INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, 2008, : 73 - +
  • [28] Opinion spam detection framework using hybrid classification scheme
    Asghar, Muhammad Zubair
    Ullah, Asmat
    Ahmad, Shakeel
    Khan, Aurangzeb
    SOFT COMPUTING, 2020, 24 (05) : 3475 - 3498
  • [29] Opinion spam detection framework using hybrid classification scheme
    Muhammad Zubair Asghar
    Asmat Ullah
    Shakeel Ahmad
    Aurangzeb Khan
    Soft Computing, 2020, 24 : 3475 - 3498
  • [30] Hybrid Semantic Service Matchmaking Method Based on a Random Forest
    Wei Jiang
    Junyu Lin
    Huiqiang Wang
    Shichen Zou
    Tsinghua Science and Technology, 2020, 25 (06) : 798 - 812