6 Million Spam Tweets: A Large Ground Truth for Timely Twitter Spam Detection

被引:0
|
作者
Chen, Chao [1 ]
Zhang, Jun [1 ,2 ]
Chen, Xiao [1 ]
Xiang, Yang [1 ]
Zhou, Wanlei [1 ]
机构
[1] Deakin Univ, Sch Informat Technol, Geelong, Vic 3125, Australia
[2] Southwest Univ, Sch Comp & Informat Sci, Chongqing 400715, Peoples R China
关键词
D O I
暂无
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Twitter has changed the way of communication and getting news for people's daily life in recent years. Meanwhile, due to the popularity of Twitter, it also becomes a main target for spamming activities. In order to stop spammers, Twitter is using Google SafeBrowsing to detect and block spam links. Despite that blacklists can block malicious URLs embedded in tweets, their lagging time hinders the ability to protect users in real-time. Thus, researchers begin to apply different machine learning algorithms to detect Twitter spam. However, there is no comprehensive evaluation on each algorithms' performance for real-time Twitter spam detection due to the lack of large ground truth. To carry out a thorough evaluation, we collected a large dataset of over 600 million public tweets. We further labelled around 6.5 million spam tweets and extracted 12 lightweight features, which can be used for online detection. In addition, we have conducted a number of experiments on six machine learning algorithms under various conditions to better understand their effectiveness and weakness for timely Twitter spam detection. We will make our labelled dataset for researchers who are interested in validating or extending our work.
引用
收藏
页码:7065 / 7070
页数:6
相关论文
共 50 条
  • [21] ENWalk: Learning Network Features for Spam Detection in Twitter
    Santosh, K. C.
    Maity, Suman Kalyan
    Mukherjee, Arjun
    [J]. SOCIAL, CULTURAL, AND BEHAVIORAL MODELING, 2017, 10354 : 90 - 101
  • [22] Machine and Deep Learning Algorithms for Twitter Spam Detection
    Alsaffar, Dalia
    Alfahhad, Amjad
    Alqhtani, Bashaier
    Alamri, Lama
    Alansari, Shahad
    Alqahtani, Nada
    Alboaneen, Dabiah A.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2019, 2020, 1058 : 483 - 491
  • [23] Semi-Supervised Spam Detection in Twitter Stream
    Sedhai, Surendra
    Sun, Aixin
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2018, 5 (01): : 169 - 175
  • [24] Stochastic Gradient Boosting Model for Twitter Spam Detection
    Devi, K. Kiruthika
    Kumar, G. A. Sathish
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2022, 41 (02): : 849 - 859
  • [25] A Novel Stream Clustering Framework for Spam Detection in Twitter
    Tajalizadeh, Hadi
    Boostani, Reza
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2019, 6 (03) : 525 - 534
  • [26] DON'T FOLLOW ME Spam Detection in Twitter
    Wang, Alex Hai
    [J]. SECRYPT 2010: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, 2010, : 142 - 151
  • [27] Statistical Twitter Spam Detection Demystified: Performance, Stability and Scalability
    Lin, Guanjun
    Sun, Nan
    Nepal, Surya
    Zhang, Jun
    Xiang, Yang
    Hassan, Houcine
    [J]. IEEE ACCESS, 2017, 5 : 11142 - 11154
  • [28] Statistical Detection of Online Drifting Twitter Spam [Invited Paper]
    Liu, Shigang
    Zhang, Jun
    Xiang, Yang
    [J]. ASIA CCS'16: PROCEEDINGS OF THE 11TH ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, : 1 - 10
  • [29] Spamming the Mainstream: A Survey on Trending Twitter Spam Detection Techniques
    Lalitha, L. A.
    Hulipalled, Vishwanath R.
    Venugopal, K. R.
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES FOR SMART NATION (SMARTTECHCON), 2017, : 444 - 448
  • [30] Twitter spam detection: Survey of new approaches and comparative study
    Wu, Tingmin
    Wen, Sheng
    Xiang, Yang
    Zhou, Wanlei
    [J]. COMPUTERS & SECURITY, 2018, 76 : 265 - 284