6 Million Spam Tweets: A Large Ground Truth for Timely Twitter Spam Detection

被引:0
|
作者
Chen, Chao [1 ]
Zhang, Jun [1 ,2 ]
Chen, Xiao [1 ]
Xiang, Yang [1 ]
Zhou, Wanlei [1 ]
机构
[1] Deakin Univ, Sch Informat Technol, Geelong, Vic 3125, Australia
[2] Southwest Univ, Sch Comp & Informat Sci, Chongqing 400715, Peoples R China
关键词
D O I
暂无
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Twitter has changed the way of communication and getting news for people's daily life in recent years. Meanwhile, due to the popularity of Twitter, it also becomes a main target for spamming activities. In order to stop spammers, Twitter is using Google SafeBrowsing to detect and block spam links. Despite that blacklists can block malicious URLs embedded in tweets, their lagging time hinders the ability to protect users in real-time. Thus, researchers begin to apply different machine learning algorithms to detect Twitter spam. However, there is no comprehensive evaluation on each algorithms' performance for real-time Twitter spam detection due to the lack of large ground truth. To carry out a thorough evaluation, we collected a large dataset of over 600 million public tweets. We further labelled around 6.5 million spam tweets and extracted 12 lightweight features, which can be used for online detection. In addition, we have conducted a number of experiments on six machine learning algorithms under various conditions to better understand their effectiveness and weakness for timely Twitter spam detection. We will make our labelled dataset for researchers who are interested in validating or extending our work.
引用
收藏
页码:7065 / 7070
页数:6
相关论文
共 50 条
  • [31] Boosting Social Spam Detection via Attention Mechanisms on Twitter
    Shen, Hua
    Liu, Xinyue
    Zhang, Xianchao
    [J]. ELECTRONICS, 2022, 11 (07)
  • [32] Twitter spam account detection based on clustering and classification methods
    Kayode Sakariyah Adewole
    Tao Han
    Wanqing Wu
    Houbing Song
    Arun Kumar Sangaiah
    [J]. The Journal of Supercomputing, 2020, 76 : 4802 - 4837
  • [33] A comparative study of the class imbalance problem in Twitter spam detection
    Li, Chaoliang
    Liu, Shigang
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (05):
  • [34] Twitter spam account detection based on clustering and classification methods
    Adewole, Kayode Sakariyah
    Hang, Tao
    Wu, Wanqing
    Songs, Houbing
    Sangaiah, Arun Kumar
    [J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (07): : 4802 - 4837
  • [35] Text-Based Spam Tweets Detection Using Neural Networks
    Mardi, Vanyashree
    Kini, Anvaya
    Sukanya, V. M.
    Rachana, S.
    [J]. ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, : 401 - 408
  • [36] MACHINE LEARNING BASED TWITTER SPAM ACCOUNT DETECTION: A REVIEW
    Gheewala, Shivangi
    Patel, Rakesh
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2018), 2018, : 79 - 84
  • [37] Twitter Spam Detection via Bilinear Autoencoding Reconstruction Error
    He, Qian
    Zhang, Sun
    Li, Bo
    Yin, Chunyong
    [J]. HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2022, 12
  • [38] HSpam14: A Collection of 14 Million Tweets for Hashtag-Oriented Spam Research
    Sedhai, Surendra
    Sun, Aixin
    [J]. SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 223 - 232
  • [39] Hspam14: A collection of 14 million tweets for hashtag-oriented spam research
    School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
    [J]. SIGIR - Proc. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr., (223-232):
  • [40] Improvised spam detection in twitter data using lightweight detectors and classifiers
    Velammal B.L.
    Aarthy N.
    [J]. International Journal of Web-Based Learning and Teaching Technologies, 2021, 16 (04) : 12 - 32