Birds of prey: identifying lexical irregularities in spam on Twitter

被引:0
|
作者
Kyle Robinson
Vijay Mago
机构
[1] Lakehead University,
来源
Wireless Networks | 2022年 / 28卷
关键词
Twitter; Spam; URLs; Machine learning; Natural language processing;
D O I
暂无
中图分类号
学科分类号
摘要
The advent of spam on social media platforms has lead to a number of problems not only for social media users but also for researchers mining social media data. While there has been substantial research on automated methods of spam detection on Twitter, research on the lexical content of spam on the platform is limited. A dataset of 301 million generic tweets was filtered through a URL blacklisting service to obtain 7207 tweets containing links to malicious web-pages. These tweets, considered spam, were combined with a random sample of non-spam tweets to obtain an overall dataset of 14,414 tweets. A total of 12 numerical tweet features were used to train and test a Random Forest algorithm with an overall classification accuracy of over 90%. In addition to the numerical features, the text of each tweet was processed to create four frequency-mapped corpora pertaining uniquely to spam and non-spam data. The corpora of words, emoji, numbers, and stop-words for spam and non-spam were plotted against each other to visualize differences in usage between the two groups. A clear distinction between words, and emoji used in spam, and non-spam tweets was observed.
引用
收藏
页码:1189 / 1196
页数:7
相关论文
共 50 条
  • [1] Birds of prey: identifying lexical irregularities in spam on Twitter
    Robinson, Kyle
    Mago, Vijay
    WIRELESS NETWORKS, 2022, 28 (03) : 1189 - 1196
  • [2] SPAM/TWITTER
    Cosoi, Alexandru Catalin
    Cosoi, Carmen Maria
    Sgarciu, Valentin
    Dumitru, Bogdan
    Vlad, Madalin Stefan
    ANNALS OF DAAAM FOR 2009 & PROCEEDINGS OF THE 20TH INTERNATIONAL DAAAM SYMPOSIUM, 2009, 20 : 105 - 106
  • [3] Spam detection on Twitter using a support vector machine and users’ features by identifying their interactions
    Saleh Beyt Sheikh Ahmad
    Mahnaz Rafie
    Seyed Mojtaba Ghorabie
    Multimedia Tools and Applications, 2021, 80 : 11583 - 11605
  • [4] Spam detection on Twitter using a support vector machine and users' features by identifying their interactions
    Ahmad, Saleh Beyt Sheikh
    Rafie, Mahnaz
    Ghorabie, Seyed Mojtaba
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (08) : 11583 - 11605
  • [5] Detecting spam accounts on Twitter
    Alom, Zulfikar
    Carminati, Barbara
    Ferrari, Elena
    2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 1191 - 1198
  • [6] Of Ahrensbok and his peers The lexical basis of name words for birds of prey in toponyms
    Menke, Hubertus
    ZEITSCHRIFT FUR DIALEKTOLOGIE UND LINGUISTIK, 2019, 86 (02): : 187 - 207
  • [7] Spam Detection on Twitter : A Survey
    Kaur, Prabhjot
    Singhal, Anuhha
    Kaur, Jasleen
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2570 - 2573
  • [8] Ecosystem of Spamming on Twitter: Analysis of Spam Reporters and Spam Reportees
    Sinha, Pooja
    Maini, Oshin
    Malik, Gunjan
    Kaushal, Rishabh
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 1705 - 1710
  • [9] A Survey On Spam URLs Detection In Twitter
    Daffa, Wafaa
    Bamasag, Omaimah
    AlMansour, Amal
    2018 1ST INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS & INFORMATION SECURITY (ICCAIS' 2018), 2018,
  • [10] Detecting Spam and Promoting Campaigns in Twitter
    Zhang, Xianchao
    Li, Zhaoxing
    Zhu, Shaoping
    Liang, Wenxin
    ACM TRANSACTIONS ON THE WEB, 2016, 10 (01)