Drifted Twitter Spam Classification Using Multiscale Detection Test on K-L Divergence

被引:20
|
作者
Wang, Xuesong [1 ]
Kang, Qi [1 ,2 ]
An, Jing [3 ]
Zhou, Mengchu [4 ]
机构
[1] Tongji Univ, Sch Elect & Informat Engn, Dept Control Sci & Engn, Shanghai 201804, Peoples R China
[2] Tongji Univ, Shanghai Inst Intelligent Sci & Technol, Shanghai 201804, Peoples R China
[3] Shanghai Inst Technol, Sch Elect & Elect Engn, Shanghai 201418, Peoples R China
[4] New Jersey Inst Technol, Helen & John C Hartmann Dept Elect & Comp Engn, Newark, NJ 07102 USA
基金
中国国家自然科学基金;
关键词
Concept drift; drift detection test; twitter spam classification; K-L divergence; ONLINE;
D O I
10.1109/ACCESS.2019.2932018
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Twitter spam classification is a tough challenge for social media platforms and cyber security companies. Twitter spam with illegal links may evolve over time in order to deceive filtering models, causing disastrous loss to both users and the whole network. We define this distributional evolution as a concept drift scenario. To build an effective model, we adopt K-L divergence to represent spam distribution and use a multiscale drift detection test (MDDT) to localize possible drifts therein. A base classifier is then retrained based on the detection result to gain performance improvement. Comprehensive experiments show that K-L divergence has highly consistent change patterns between features when a drift occurs. Also, the MDDT is proved to be effective in improving final classification result in both accuracy, recall, and f-measure.
引用
收藏
页码:108384 / 108394
页数:11
相关论文
共 50 条
  • [1] Improving PART algorithm with K-L divergence for imbalanced classification
    Su, Chong
    Ju, Shenggen
    Liu, Yiguang
    Yu, Zhonghua
    INTELLIGENT DATA ANALYSIS, 2015, 19 (05) : 1035 - 1048
  • [2] Statistical Features-Based Real-Time Detection of Drifted Twitter Spam
    Chen, Chao
    Wang, Yu
    Zhang, Jun
    Xiang, Yang
    Zhou, Wanlei
    Min, Geyong
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2017, 12 (04) : 914 - 925
  • [3] Adaptive Classification for Spam Detection on Twitter with Specific Data
    Dangkesee, Thayakorn
    Puntheeranurak, Sutheera
    2017 21ST INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE (ICSEC 2017), 2017, : 243 - 246
  • [4] Robust fault-detection based on residual K-L divergence for wind turbines
    Zhang, Yuxian
    Wang, Kefeng
    Qian, Xiaoyi
    Gendeel, Mohammed
    IET RENEWABLE POWER GENERATION, 2019, 13 (13) : 2400 - 2408
  • [5] Twitter spam account detection based on clustering and classification methods
    Kayode Sakariyah Adewole
    Tao Han
    Wanqing Wu
    Houbing Song
    Arun Kumar Sangaiah
    The Journal of Supercomputing, 2020, 76 : 4802 - 4837
  • [6] Twitter spam account detection based on clustering and classification methods
    Adewole, Kayode Sakariyah
    Hang, Tao
    Wu, Wanqing
    Songs, Houbing
    Sangaiah, Arun Kumar
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (07): : 4802 - 4837
  • [7] Provably secure steganography: Achieving zero K-L divergence using statistical restoration
    Solanki, K.
    Sullivan, K.
    Madhow, U.
    Manjunath, B. S.
    Chandrasekaran, S.
    2006 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP 2006, PROCEEDINGS, 2006, : 125 - +
  • [8] K-L Divergence-Based Detection of Attacks on Remote Control: The Utilization of Local Information
    Qu, Fuyi
    Yang, Nachuan
    Liu, Hao
    Li, Yuzhe
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (06) : 9070 - 9079
  • [9] Twitter Spam Detection Using Naive Bayes Classifier
    Santoshi, K. Ushasree
    Bhavya, S. Sree
    Sri, Y. Bhavya
    Venkateswarlu, B.
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2021), 2021, : 773 - 777
  • [10] Threshold and Associative Based Classification for Social Spam Profile Detection on Twitter
    Hua, Willian
    Zhang, Yanqing
    2013 NINTH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2013, : 113 - 120