Birds of prey: identifying lexical irregularities in spam on Twitter

被引:0
|
作者
Kyle Robinson
Vijay Mago
机构
[1] Lakehead University,
来源
Wireless Networks | 2022年 / 28卷
关键词
Twitter; Spam; URLs; Machine learning; Natural language processing;
D O I
暂无
中图分类号
学科分类号
摘要
The advent of spam on social media platforms has lead to a number of problems not only for social media users but also for researchers mining social media data. While there has been substantial research on automated methods of spam detection on Twitter, research on the lexical content of spam on the platform is limited. A dataset of 301 million generic tweets was filtered through a URL blacklisting service to obtain 7207 tweets containing links to malicious web-pages. These tweets, considered spam, were combined with a random sample of non-spam tweets to obtain an overall dataset of 14,414 tweets. A total of 12 numerical tweet features were used to train and test a Random Forest algorithm with an overall classification accuracy of over 90%. In addition to the numerical features, the text of each tweet was processed to create four frequency-mapped corpora pertaining uniquely to spam and non-spam data. The corpora of words, emoji, numbers, and stop-words for spam and non-spam were plotted against each other to visualize differences in usage between the two groups. A clear distinction between words, and emoji used in spam, and non-spam tweets was observed.
引用
收藏
页码:1189 / 1196
页数:7
相关论文
共 50 条
  • [21] Identifying pollution hot spots from polychlorinated biphenyl residues in birds of prey
    Broughton, RK
    Osborn, D
    Shore, RF
    Wienburg, CL
    Wadsworth, RA
    ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY, 2003, 22 (10) : 2519 - 2524
  • [22] Lexical Normalisation of Twitter Data
    Ahmed, Bilal
    2015 SCIENCE AND INFORMATION CONFERENCE (SAI), 2015, : 326 - 328
  • [23] Spam2Vec: Learning Biased Embeddings for Spam Detection in Twitter
    Maity, Suman Kalyan
    Santosh, K. C.
    Mukherjee, Arjun
    COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 63 - 64
  • [24] A Multi-classifier Framework for Detecting Spam and Fake Spam Messages in Twitter
    Raj, R. Jeberson Retna
    Srinivasulu, Senduru
    Ashutosh, Aldrin
    2020 IEEE 9TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT 2020), 2020, : 266 - 270
  • [25] Machine Learning for the Detection of Spam in Twitter Networks
    Wang, Alex Hai
    E-BUSINESS AND TELECOMMUNICATIONS, 2012, 222 : 319 - 333
  • [26] A deep learning model for Twitter spam detection
    Alom, Zulfikar
    Carminati, Barbara
    Ferrari, Elena
    Online Social Networks and Media, 2020, 18
  • [27] An Evaluation of the Effect of Spam on Twitter Trending Topics
    Stafford, Grant
    Yu, Louis Lei
    2013 ASE/IEEE INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING (SOCIALCOM), 2013, : 373 - 378
  • [28] Dynamic Feature Selection for Spam Detection in Twitter
    Karakasli, M. Salih
    Aydin, Muhammed Ali
    Yarkan, Serhan
    Boyaci, Ali
    INTERNATIONAL TELECOMMUNICATIONS CONFERENCE, ITELCON 2017, 2019, 504 : 239 - 250
  • [29] A survey on detecting spam accounts on Twitter network
    Citlak, Oguzhan
    Dorterler, Murat
    Dogru, Ibrahim Alper
    SOCIAL NETWORK ANALYSIS AND MINING, 2019, 9 (01)
  • [30] Tweet and Account Based Spam Detection on Twitter
    Gungor, Kubra Nur
    Erdem, O. Ayhan
    Dogru, Ibrahim Alper
    ARTIFICIAL INTELLIGENCE AND APPLIED MATHEMATICS IN ENGINEERING PROBLEMS, 2020, 43 : 898 - 905