Spam Detection Based on Feature Evolution to Deal with Concept Drift

被引:5
|
作者
Henke, Marcia [1 ]
Santos, Eulanda [2 ]
Souto, Eduardo [2 ]
Santin, Altair O. [3 ]
机构
[1] Fed Univ Santa Maria UFSM, Ind Tech Coll Santa Maria CTISM, Santa Maria, RS, Brazil
[2] Fed Univ Amazonas UFAM, Comp Inst ICOMP, Manaus, Amazonas, Brazil
[3] Pontificia Univ Catolica Parana, Curitiba, PR, Brazil
关键词
Computer Security Network; Machine Learning; Concept Drift;
D O I
10.3897/jucs.66284
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Electronic messages are still considered the most significant tools in business and personal applications due to their low cost and easy access. However, e-mails have become a major problem owing to the high amount of junk mail, named spam, which fill the e-mail boxes of users. Several approaches have been proposed to detect spam, such as filters implemented in e-mail servers and user-based spam message classification mechanisms. A major problem with these approaches is spam detection in the presence of concept drift, especially as a result of changes in features over time. To overcome this problem, this work proposes a new spam detection system based on analyzing the evolution of features. The proposed method is divided into three steps: 1) spam classification model training; 2) concept drift detection; and 3) knowledge transfer learning. The first step generates classification models, as commonly conducted in machine learning. The second step introduces a new strategy to avoid concept drift: SFS (Similarity-based Features Selection) that analyzes the evolution of the features taking into account similarity obtained between the feature vectors extracted from training data and test data. Finally, the third step focuses on the following questions: what, how, and when to transfer acquired knowledge? The proposed method is evaluated using two public datasets. The results of the experiments show that it is possible to infer a threshold to detect changes (drift) in order to ensure that the spam classification model is updated through knowledge transfer. Moreover, our anomaly detection system is able to perform spam classification and concept drift detection as two parallel and independent tasks.
引用
收藏
页码:364 / 386
页数:23
相关论文
共 50 条
  • [21] Term Space Partition Based Ensemble Feature Construction for Spam Detection
    Mi, Guyue
    Gao, Yang
    Tan, Ying
    DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 205 - 216
  • [22] Catching the drift: Using feature-free case-based reasoning for spam filtering
    Delany, Sarah Jane
    Bridge, Derek
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2007, 4626 : 314 - +
  • [23] Variable Length Concentration based Feature Construction Method for Spam Detection
    Gao, Yang
    Mi, Guyue
    Tan, Ying
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [24] Towards Online Concept Drift Detection with Feature Selection for Data Stream Classification
    Hammoodi, Mahmood
    Stahl, Frederic
    Tennant, Mark
    ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 1549 - 1550
  • [25] Concept Drift Detection Based on Equal Density Estimation
    Gu, Feng
    Zhang, Guangquan
    Lu, Jie
    Lin, Chin-Teng
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 24 - 30
  • [26] Concept Drift Based on Subspace Learning for Intrusion Detection
    Wu, Bin
    Lin, Hai-Zhuo
    Feng, Lin
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND INFORMATION SYSTEMS, 2016, 52 : 421 - 425
  • [27] Concept Drift Class Detection Based on Time Window
    Guo H.
    Ren Q.
    Wang W.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2022, 59 (01): : 127 - 143
  • [28] Opinion Spam Detection Using Feature Selection
    Patel, Rinki
    Thakkar, Priyank
    2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 560 - 564
  • [29] Dynamic Feature Selection for Spam Detection in Twitter
    Karakasli, M. Salih
    Aydin, Muhammed Ali
    Yarkan, Serhan
    Boyaci, Ali
    INTERNATIONAL TELECOMMUNICATIONS CONFERENCE, ITELCON 2017, 2019, 504 : 239 - 250
  • [30] CONCEPT DRIFT AND EVOLUTION DETECTION IN FUSION DIAGNOSIS WITH EVOLVING DATA STREAMS
    Abdolsamadi, Amirmahyar
    Wang, Pingfeng
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2017, VOL 2A, 2017,