A review of spam email detection: analysis of spammer strategies and the dataset shift problem

被引:19
|
作者
Janez-Martino, Francisco [1 ,2 ]
Alaiz-Rodriguez, Rocio [1 ,2 ]
Gonzalez-Castro, Victor [1 ,2 ]
Fidalgo, Eduardo [1 ,2 ]
Alegre, Enrique [1 ,2 ]
机构
[1] Univ Leon, Dept Elect Syst & Automat, Leon, Spain
[2] INCIBE Spanish Natl Cybersecur Inst, Leon, Spain
关键词
Spam email detection; Dataset shift; Adversarial machine learning; Spammer strategies; Feature selection; CONCEPT DRIFT; FEATURE-SELECTION; CLASSIFICATION; PATTERNS;
D O I
10.1007/s10462-022-10195-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spam emails have been traditionally seen as just annoying and unsolicited emails containing advertisements, but they increasingly include scams, malware or phishing. In order to ensure the security and integrity for the users, organisations and researchers aim to develop robust filters for spam email detection. Recently, most spam filters based on machine learning algorithms published in academic journals report very high performance, but users are still reporting a rising number of frauds and attacks via spam emails. Two main challenges can be found in this field: (a) it is a very dynamic environment prone to the dataset shift problem and (b) it suffers from the presence of an adversarial figure, i.e. the spammer. Unlike classical spam email reviews, this one is particularly focused on the problems that this constantly changing environment poses. Moreover, we analyse the different spammer strategies used for contaminating the emails, and we review the state-of-the-art techniques to develop filters based on machine learning. Finally, we empirically evaluate and present the consequences of ignoring the matter of dataset shift in this practical field. Experimental results show that this shift may lead to severe degradation in the estimated generalisation performance, with error rates reaching values up to 48.81%.
引用
收藏
页码:1145 / 1173
页数:29
相关论文
共 34 条
  • [1] A review of spam email detection: analysis of spammer strategies and the dataset shift problem
    Francisco Jáñez-Martino
    Rocío Alaiz-Rodríguez
    Víctor González-Castro
    Eduardo Fidalgo
    Enrique Alegre
    [J]. Artificial Intelligence Review, 2023, 56 : 1145 - 1173
  • [2] If it looks like a spammer and behaves like a spammer, it must be a spammer: analysis and detection of microblogging spam accounts
    Abdullah Almaatouq
    Erez Shmueli
    Mariam Nouh
    Ahmad Alabdulkareem
    Vivek K. Singh
    Mansour Alsaleh
    Abdulrahman Alarifi
    Anas Alfaris
    Alex ‘Sandy’ Pentland
    [J]. International Journal of Information Security, 2016, 15 : 475 - 491
  • [3] If it looks like a spammer and behaves like a spammer, it must be a spammer: analysis and detection of microblogging spam accounts
    Almaatouq, Abdullah
    Shmueli, Erez
    Nouh, Mariam
    Alabdulkareem, Ahmad
    Singh, Vivek K.
    Alsaleh, Mansour
    Alarifi, Abdulrahman
    Alfaris, Anas
    Pentland, Alex 'Sandy'
    [J]. INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2016, 15 (05) : 475 - 491
  • [4] Spam Review Detection Using the Linguistic and Spammer Behavioral Methods
    Hussain, Naveed
    Mirza, Hamid Turab
    Hussain, Ibrar
    Iqbal, Faiza
    Memon, Imran
    [J]. IEEE ACCESS, 2020, 8 : 53801 - 53816
  • [5] Email Shape Analysis for Spam Botnet Detection
    Sroufe, Paul
    Phithakkitnukoon, Santi
    Dantu, Ram
    Cangussu, Joao
    [J]. 2009 6TH IEEE CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE, VOLS 1 AND 2, 2009, : 1074 - +
  • [6] Email Spam Detection by Machine Learning Approaches: A Review
    Hadi, Mohammad Talib
    Baawi, Salwa Shakir
    [J]. FORTHCOMING NETWORKS AND SUSTAINABILITY IN THE AIOT ERA, VOL 1, FONES-AIOT 2024, 2024, 1035 : 186 - 204
  • [7] Detecting Spam Review through Spammer's Behavior Analysis
    Hussain, Naveed
    Mirza, Hamid Turab
    Hussain, Ibrar
    [J]. ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2019, 8 (02): : 61 - 71
  • [8] Detection of Zombie PCs Based on Email Spam Analysis
    Jeong, HyunCheol
    Kim, Huy Kang
    Lee, Sangjin
    Kim, Eunjin
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2012, 6 (05): : 1445 - 1462
  • [9] Advances in spam detection for email spam, web spam, social network spam, and review spam: ML-based and nature-inspired-based techniques
    Akinyelu, Andronicus A.
    [J]. JOURNAL OF COMPUTER SECURITY, 2021, 29 (05) : 473 - 529
  • [10] VGI and crowdsourced data credibility analysis using spam email detection techniques
    Koswatte, Saman
    McDougall, Kevin
    Liu, Xiaoye
    [J]. INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2018, 11 (05) : 520 - 532