Twitter spam account detection based on clustering and classification methods

被引:0
|
作者
Kayode Sakariyah Adewole
Tao Han
Wanqing Wu
Houbing Song
Arun Kumar Sangaiah
机构
[1] University of Ilorin,Faculty of Communication and Information Sciences
[2] Dongguan University of Technology,DGUT
[3] Shenzhen Institutes of Advanced Technology (SIAT),CNAM Institute
[4] Chinese Academy of Sciences (CAS),CAS Key Laboratory of Human
[5] Embry-Riddle Aeronautical University,Machine Intelligence
[6] Vellore Institute of Technology,Synergy Systems
来源
关键词
Online social network; Spam detection; Fake account; Clustering; Classification;
D O I
暂无
中图分类号
学科分类号
摘要
Twitter social network has gained more popularity due to the increase in social activities of registered users. Twitter performs dual functions of online social network (OSN), acting as a microblogging OSN, and at the same time as a news update platform. Recently, the growth in Twitter social interactions has attracted the attention of cybercriminals. Spammers have used Twitter to spread malicious messages, post phishing links, flood the network with fake accounts, and engage in other malicious activities. The process of detecting the network of spammers who engage in these activities is an important step toward identifying individual spam account. Researchers have proposed a number of approaches to identify a group of spammers. However, each of these approaches addressed a specific category of spammer. This paper proposes a different approach to detect spammers on Twitter based on the similarities that exist among spam accounts. A number of features were introduced to improve the performance of the three classification algorithms selected in this study. The proposed approach applied principal component analysis and tuned K-means algorithm to cluster over 200,000 accounts, randomly selected from more than 2 million tweets to detect the clusters of spammers. Experimental results show that Random Forest achieved the highest accuracy of 96.30%. This result is followed by multilayer perceptron with 96.00% and support vector machine, which achieved 95.60%. The performance of the selected classifiers based on class imbalance also revealed that Random Forest achieved the highest accuracy, precision, recall, and F-measure.
引用
收藏
页码:4802 / 4837
页数:35
相关论文
共 50 条
  • [31] Drifted Twitter Spam Classification Using Multiscale Detection Test on K-L Divergence
    Wang, Xuesong
    Kang, Qi
    An, Jing
    Zhou, Mengchu
    IEEE ACCESS, 2019, 7 : 108384 - 108394
  • [32] Spam2Vec: Learning Biased Embeddings for Spam Detection in Twitter
    Maity, Suman Kalyan
    Santosh, K. C.
    Mukherjee, Arjun
    COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 63 - 64
  • [33] Machine Learning based Optimization Scheme for Detection of Spam and Malware Propagation in Twitter
    Sheoran, Savita Kumari
    Yadav, Partibha
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (02) : 495 - 503
  • [34] How Spam Features Change in Twitter and the Impact to Machine Learning Based Detection
    Wu, Tingmin
    Wang, Derek
    Wen, Sheng
    Xiang, Yang
    INFORMATION SECURITY PRACTICE AND EXPERIENCE, ISPEC 2017, 2017, 10701 : 898 - 904
  • [35] Collective Classification of Spam Campaigners on Twitter: A Hierarchical Meta-Path Based Approach
    Gupta, Srishti
    Khattar, Abhinav
    Gogia, Arpit
    Kumaraguru, Ponnurangam
    Chakraborty, Tanmoy
    WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 2018, : 529 - 538
  • [36] A Framework for Real-Time Spam Detection in Twitter
    Gupta, Himank
    Jamal, Mohd. Saalim
    Madisetty, Sreekanth
    Desarkar, Maunendra Sankar
    2018 10TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2018, : 380 - 387
  • [37] Detecting Spam Tweets In Twitter Using a Data Stream Clustering Algorithm
    Eshraqi, Nasim
    Jalali, Mehrdad
    Moattar, Mohammad Hossein
    SECOND INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK 2015), 2015, : 347 - 351
  • [38] ENWalk: Learning Network Features for Spam Detection in Twitter
    Santosh, K. C.
    Maity, Suman Kalyan
    Mukherjee, Arjun
    SOCIAL, CULTURAL, AND BEHAVIORAL MODELING, 2017, 10354 : 90 - 101
  • [39] Twitter Spam Detection Using Naive Bayes Classifier
    Santoshi, K. Ushasree
    Bhavya, S. Sree
    Sri, Y. Bhavya
    Venkateswarlu, B.
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2021), 2021, : 773 - 777
  • [40] Semi-Supervised Spam Detection in Twitter Stream
    Sedhai, Surendra
    Sun, Aixin
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2018, 5 (01): : 169 - 175