Twitter spam account detection based on clustering and classification methods

被引:0
|
作者
Kayode Sakariyah Adewole
Tao Han
Wanqing Wu
Houbing Song
Arun Kumar Sangaiah
机构
[1] University of Ilorin,Faculty of Communication and Information Sciences
[2] Dongguan University of Technology,DGUT
[3] Shenzhen Institutes of Advanced Technology (SIAT),CNAM Institute
[4] Chinese Academy of Sciences (CAS),CAS Key Laboratory of Human
[5] Embry-Riddle Aeronautical University,Machine Intelligence
[6] Vellore Institute of Technology,Synergy Systems
来源
关键词
Online social network; Spam detection; Fake account; Clustering; Classification;
D O I
暂无
中图分类号
学科分类号
摘要
Twitter social network has gained more popularity due to the increase in social activities of registered users. Twitter performs dual functions of online social network (OSN), acting as a microblogging OSN, and at the same time as a news update platform. Recently, the growth in Twitter social interactions has attracted the attention of cybercriminals. Spammers have used Twitter to spread malicious messages, post phishing links, flood the network with fake accounts, and engage in other malicious activities. The process of detecting the network of spammers who engage in these activities is an important step toward identifying individual spam account. Researchers have proposed a number of approaches to identify a group of spammers. However, each of these approaches addressed a specific category of spammer. This paper proposes a different approach to detect spammers on Twitter based on the similarities that exist among spam accounts. A number of features were introduced to improve the performance of the three classification algorithms selected in this study. The proposed approach applied principal component analysis and tuned K-means algorithm to cluster over 200,000 accounts, randomly selected from more than 2 million tweets to detect the clusters of spammers. Experimental results show that Random Forest achieved the highest accuracy of 96.30%. This result is followed by multilayer perceptron with 96.00% and support vector machine, which achieved 95.60%. The performance of the selected classifiers based on class imbalance also revealed that Random Forest achieved the highest accuracy, precision, recall, and F-measure.
引用
收藏
页码:4802 / 4837
页数:35
相关论文
共 50 条
  • [21] Machine Learning for the Detection of Spam in Twitter Networks
    Wang, Alex Hai
    E-BUSINESS AND TELECOMMUNICATIONS, 2012, 222 : 319 - 333
  • [22] A deep learning model for Twitter spam detection
    Alom, Zulfikar
    Carminati, Barbara
    Ferrari, Elena
    Online Social Networks and Media, 2020, 18
  • [23] Dynamic Feature Selection for Spam Detection in Twitter
    Karakasli, M. Salih
    Aydin, Muhammed Ali
    Yarkan, Serhan
    Boyaci, Ali
    INTERNATIONAL TELECOMMUNICATIONS CONFERENCE, ITELCON 2017, 2019, 504 : 239 - 250
  • [24] Detection of spam-posting accounts on Twitter
    Inuwa-Dutse, Isa
    Liptrott, Mark
    Korkontzelos, Ioannis
    NEUROCOMPUTING, 2018, 315 : 496 - 511
  • [25] Malicious Account Detection Based on Short URLs in Twitter
    Venkatesh, Rasula
    Rout, Jitendra Kumar
    Jena, S. K.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL, NETWORKS, COMPUTING, AND SYSTEMS (ICSNCS 2016), VOL 1, 2017, 395 : 243 - 251
  • [26] Investigating the Effect of Combining Text Clustering with Classification on Improving Spam Email Detection
    Hassan, Doaa
    INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA 2016), 2017, 557 : 99 - 107
  • [27] Twitter Fake Account Detection
    Ersahin, Buket
    Aktas, Ozlem
    Kilinc, Deniz
    Akyol, Ceyhun
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 388 - 392
  • [28] A Spam Message Detection Model Based on Bayesian Classification
    Yang, Yitao
    Hu, Runqiu
    Qiu, Chengyan
    Sun, Guozi
    Li, Huakang
    ADVANCES IN INTERNETWORKING, DATA & WEB TECHNOLOGIES, EIDWT-2017, 2018, 6 : 424 - 435
  • [29] Ensemble-Based Text Classification for Spam Detection
    Zhang X.
    Liu G.
    Zhang M.
    Informatica (Slovenia), 2024, 48 (06): : 71 - 80
  • [30] Spam Detection using Dynamic Weighted Voting based on Clustering
    Saeedian, Mehmoush Famil
    Beigy, Hamid
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL II, PROCEEDINGS, 2008, : 122 - 126