6 Million Spam Tweets: A Large Ground Truth for Timely Twitter Spam Detection

被引:0
|
作者
Chen, Chao [1 ]
Zhang, Jun [1 ,2 ]
Chen, Xiao [1 ]
Xiang, Yang [1 ]
Zhou, Wanlei [1 ]
机构
[1] Deakin Univ, Sch Informat Technol, Geelong, Vic 3125, Australia
[2] Southwest Univ, Sch Comp & Informat Sci, Chongqing 400715, Peoples R China
关键词
D O I
暂无
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
Twitter has changed the way of communication and getting news for people's daily life in recent years. Meanwhile, due to the popularity of Twitter, it also becomes a main target for spamming activities. In order to stop spammers, Twitter is using Google SafeBrowsing to detect and block spam links. Despite that blacklists can block malicious URLs embedded in tweets, their lagging time hinders the ability to protect users in real-time. Thus, researchers begin to apply different machine learning algorithms to detect Twitter spam. However, there is no comprehensive evaluation on each algorithms' performance for real-time Twitter spam detection due to the lack of large ground truth. To carry out a thorough evaluation, we collected a large dataset of over 600 million public tweets. We further labelled around 6.5 million spam tweets and extracted 12 lightweight features, which can be used for online detection. In addition, we have conducted a number of experiments on six machine learning algorithms under various conditions to better understand their effectiveness and weakness for timely Twitter spam detection. We will make our labelled dataset for researchers who are interested in validating or extending our work.
引用
收藏
页码:7065 / 7070
页数:6
相关论文
共 50 条
  • [1] Spam Detection on Twitter : A Survey
    Kaur, Prabhjot
    Singhal, Anuhha
    Kaur, Jasleen
    [J]. PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2570 - 2573
  • [2] A Survey On Spam URLs Detection In Twitter
    Daffa, Wafaa
    Bamasag, Omaimah
    AlMansour, Amal
    [J]. 2018 1ST INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS & INFORMATION SECURITY (ICCAIS' 2018), 2018,
  • [3] "TwitterSpamDetector" A Spam Detection Framework for Twitter
    Kabakus, Abdullah Talha
    Kara, Resul
    [J]. INTERNATIONAL JOURNAL OF KNOWLEDGE AND SYSTEMS SCIENCE, 2019, 10 (03) : 1 - 14
  • [4] A Hybrid Approach for Spam Detection for Twitter
    Mateen, Malik
    Aleem, Muhammad
    Iqbal, Muhammad Azhar
    Islam, Muhammad Arshad
    [J]. PROCEEDINGS OF 2017 14TH INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGY (IBCAST), 2017, : 466 - 471
  • [5] State of the Art on Twitter Spam Detection
    Borse, Dipalee
    Borse, Swati
    [J]. Smart Innovation, Systems and Technologies, 2022, 303 SIST : 486 - 496
  • [6] Sentiment Based Twitter Spam Detection
    Perveen, Nasira
    Missen, Malik M. Saad
    Rasool, Qaisar
    Akhtar, Nadeem
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (07) : 568 - 573
  • [7] Spam2Vec: Learning Biased Embeddings for Spam Detection in Twitter
    Maity, Suman Kalyan
    Santosh, K. C.
    Mukherjee, Arjun
    [J]. COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 63 - 64
  • [8] Spam Filter Evaluation with Imprecise Ground Truth
    Cormack, Gordon V.
    Kolcz, Aleksander
    [J]. PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 604 - 611
  • [9] Detecting Spam Tweets In Twitter Using a Data Stream Clustering Algorithm
    Eshraqi, Nasim
    Jalali, Mehrdad
    Moattar, Mohammad Hossein
    [J]. SECOND INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK 2015), 2015, : 347 - 351
  • [10] A deep learning model for Twitter spam detection
    Alom, Zulfikar
    Carminati, Barbara
    Ferrari, Elena
    [J]. Online Social Networks and Media, 2020, 18