BEAN: a BEhavior ANalysis approach of URL spam filtering in Twitter

被引:5
|
作者
Wang, De [1 ]
Pu, Calton [1 ]
机构
[1] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/IRI.2015.69
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Social websites, like Twitter and Facebook, strive to detect and remove URL spam in order to keep their users happy and coming back. Although researchers have already proposed many filtering approaches such as SpamRank and TrustRank, most of which detect URL spam using content analysis on the Web pages behind or link analysis on Web graph, it is challenging to automatically detect URL spam in social media as spammers keep evolving and advancing their techniques, such as cloaking based on the IP addresses, using multiple user accounts and redirectors. In this paper, we introduce BEAN, a behavior analysis technique, which detects URL spam by capturing the anomalous message sending behaviors of spammers. Twitter is an ideal place for our analysis due to its popularity and real-time properties. We collect over 2.4 million tweets from around a million users based on Twitter trending topics for 4 months. We apply our behavior analysis approach derived from a Markov Chain model to the Twitter dataset, and achieve a precision of 0.91 and recall of 0.88. In doing so we detected a lot of URL spam that cannot be filtered out by conventional approaches such as SVM and TrustRank, indicating that our approach is a good complement to existing URL spam detection techniques. Also, we further investigate anomalous behavior patterns of spammers in spreading URL spam to confirm our assumption.
引用
收藏
页码:403 / 410
页数:8
相关论文
共 50 条
  • [1] Spam filtering with dynamically updated URL statistics
    Kim, Jangbok
    Chung, Kihyun
    Choi, Kyunghee
    [J]. IEEE SECURITY & PRIVACY, 2007, 5 (04) : 33 - 39
  • [2] An Integrated Approach to Spam Classification on Twitter Using URL Analysis, Natural Language Processing and Machine Learning Techniques
    Kandasamy, Kamalanathan
    Koroth, Preethi
    [J]. 2014 IEEE STUDENTS' CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER SCIENCE (SCEECS), 2014,
  • [3] Spam mail filtering through dynamically updating URL statistics
    Kim, J
    Choi, K
    Jung, G
    [J]. IASTED International Conference on Web Technologies, Applications, and Services, 2005, : 41 - 47
  • [4] An Approach to URL Filtering in SDN
    Janani, K. Archana
    Vetriselvi, V.
    Parthasarathi, Ranjani
    Rao, G. Subrahmanya V. R. K.
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND COMMUNICATION TECHNOLOGIES (ICCNCT 2018), 2019, 15 : 217 - 228
  • [5] Design and Evaluation of a Real-Time URL Spam Filtering Service
    Thomas, Kurt
    Grier, Chris
    Ma, Justin
    Paxson, Vern
    Song, Dawn
    [J]. 2011 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2011), 2011, : 447 - 462
  • [6] A Hybrid Approach for Spam Detection for Twitter
    Mateen, Malik
    Aleem, Muhammad
    Iqbal, Muhammad Azhar
    Islam, Muhammad Arshad
    [J]. PROCEEDINGS OF 2017 14TH INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGY (IBCAST), 2017, : 466 - 471
  • [7] Near-duplicate mail detection based on URL information for spam filtering
    Yeh, Chun-Chao
    Lin, Chia-Hui
    [J]. Information Networking: ADVANCES IN DATA COMMUNICATIONS AND WIRELESS NETWORKS, 2006, 3961 : 842 - 851
  • [8] Spam Filtering in Twitter Using Sender-Receiver Relationship
    Song, Jonghyuk
    Lee, Sangho
    Kim, Jong
    [J]. RECENT ADVANCES IN INTRUSION DETECTION, 2011, 6961 : 301 - +
  • [9] Filtering Spam in Social Tagging System with Dynamic Behavior Analysis
    Liu, Bo
    Zhai, Ennan
    Sun, Huiping
    Chen, Yelu
    Chen, Zhong
    [J]. 2009 INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, 2009, : 95 - 100
  • [10] Ecosystem of Spamming on Twitter: Analysis of Spam Reporters and Spam Reportees
    Sinha, Pooja
    Maini, Oshin
    Malik, Gunjan
    Kaushal, Rishabh
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 1705 - 1710