Spam Detection Based on Feature Evolution to Deal with Concept Drift

被引:5
|
作者
Henke, Marcia [1 ]
Santos, Eulanda [2 ]
Souto, Eduardo [2 ]
Santin, Altair O. [3 ]
机构
[1] Fed Univ Santa Maria UFSM, Ind Tech Coll Santa Maria CTISM, Santa Maria, RS, Brazil
[2] Fed Univ Amazonas UFAM, Comp Inst ICOMP, Manaus, Amazonas, Brazil
[3] Pontificia Univ Catolica Parana, Curitiba, PR, Brazil
关键词
Computer Security Network; Machine Learning; Concept Drift;
D O I
10.3897/jucs.66284
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Electronic messages are still considered the most significant tools in business and personal applications due to their low cost and easy access. However, e-mails have become a major problem owing to the high amount of junk mail, named spam, which fill the e-mail boxes of users. Several approaches have been proposed to detect spam, such as filters implemented in e-mail servers and user-based spam message classification mechanisms. A major problem with these approaches is spam detection in the presence of concept drift, especially as a result of changes in features over time. To overcome this problem, this work proposes a new spam detection system based on analyzing the evolution of features. The proposed method is divided into three steps: 1) spam classification model training; 2) concept drift detection; and 3) knowledge transfer learning. The first step generates classification models, as commonly conducted in machine learning. The second step introduces a new strategy to avoid concept drift: SFS (Similarity-based Features Selection) that analyzes the evolution of the features taking into account similarity obtained between the feature vectors extracted from training data and test data. Finally, the third step focuses on the following questions: what, how, and when to transfer acquired knowledge? The proposed method is evaluated using two public datasets. The results of the experiments show that it is possible to infer a threshold to detect changes (drift) in order to ensure that the spam classification model is updated through knowledge transfer. Moreover, our anomaly detection system is able to perform spam classification and concept drift detection as two parallel and independent tasks.
引用
收藏
页码:364 / 386
页数:23
相关论文
共 50 条
  • [31] Applying lazy learning algorithms to tackle concept drift in spam filtering
    Fdez-Riverola, F.
    Iglesias, E. L.
    Diaz, F.
    Mendez, J. R.
    Corchado, J. M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) : 36 - 48
  • [32] An efficient incremental learning mechanism for tracking concept drift in spam filtering
    Sheu, Jyh-Jian
    Chu, Ko-Tsung
    Li, Nien-Feng
    Lee, Cheng-Chi
    PLOS ONE, 2017, 12 (02):
  • [33] ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift
    Delany, Sarah Jane
    Cunningham, Padraig
    Smyth, Barry
    ECAI 2006, PROCEEDINGS, 2006, 141 : 627 - +
  • [34] Using machine learning to deal with Phishing and Spam Detection: An overview
    El Kouari, Oumaima
    Benaboud, Hafssa
    Lazaar, Saiida
    3RD INTERNATIONAL CONFERENCE ON NETWORKING, INFORMATION SYSTEM & SECURITY (NISS'20), 2020,
  • [35] Improving Email Spam Detection Using Content Based Feature Engineering Approach
    Hijawi, Wadi'
    Faris, Hossam
    Alqatawna, Ja'far
    Al-Zoubi, Ala' M.
    Aljarah, Ibrahim
    2017 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT), 2017,
  • [36] Detection & management of concept drift
    Mak, Lee-Onn
    Krause, Paul
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 3486 - +
  • [37] A consensus pattern of content feature and link feature for web spam detection
    Gao, Shuang
    Zhang, Huaxiang
    Liu, Li
    Fang, Xiaonan
    Zhang, H. (824223485@163.com), 1600, Binary Information Press (10): : 3759 - 3766
  • [38] Using Evolving Ensembles to Deal with Concept Drift in Streaming Scenarios
    Ramos, Diogo
    Carneiro, Davide
    Novais, Paulo
    INTELLIGENT DISTRIBUTED COMPUTING XIV, 2022, 1026 : 59 - 68
  • [39] Drift-detection Based Incremental Ensemble for Reacting to Different Kinds of Concept Drift
    Li, Zeng
    Xiong, Yan
    Huang, Wenchao
    5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 107 - 114
  • [40] Handling Concept Drift and Feature Evolution in Textual Data Stream Using the Artificial Immune System
    Abid, Amal
    Jamoussi, Salma
    Ben Hamadou, Abdelmajid
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT I, 2018, 11055 : 363 - 372