Towards Automated Comprehensive Feature Engineering for Spam Detection

被引:2
|
作者
Kiwanuka, Fred N. [1 ]
Alqatawna, Ja'far [1 ,2 ]
Amin, Anang Hudaya Muhamad [1 ]
Paul, Sujni [1 ]
Faris, Hossam [2 ]
机构
[1] Higher Coll Technol, Comp Informat Sci, Dubai, U Arab Emirates
[2] Univ Jordan, King Abdullah II Sch Informat Technol, Amman, Jordan
关键词
Spam Detection; Dataset Processing; Automated Feature Engineering; Classification; Spam Features; Data Mining; Machine Learning; !text type='Python']Python[!/text] E-mail Feature Extraction and Classification Tool (CPyEFECT); CLASSIFICATION;
D O I
10.5220/0007393004290437
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Everyday billions of emails are passed or processed through online servers of which about 59% is spam according to a recent research. Spam emails have increasingly contained viruses or other harmful malware and are a security risk to computer systems. The importance of spam filtering and the security of computer systems has become more essential than ever. The rate of evolution of spam nowadays is so high and hence previously successful spam detection methods are failing to cope. In this paper, we propose a comprehensive and automated feature engineering framework for spam classification. The proposed framework enables first, the development of a large number of features from any email corpus, and second extracting automated features using feature transformation and aggregation primitives. We show that the performance of classification of spam improves between 2% to 28% for almost all conventional machine learning classifiers when using automated feature engineering. As a by product of our comprehensive automated feature engineering, we develop a Python-based open source tool, which incorporates the proposed framework.
引用
收藏
页码:429 / 437
页数:9
相关论文
共 50 条
  • [41] Towards the automated engineering of a synthetic genome
    Carrera, Javier
    Rodrigo, Guillermo
    Jaramillo, Alfonso
    [J]. MOLECULAR BIOSYSTEMS, 2009, 5 (07) : 733 - 743
  • [42] Automated Feature Engineering using Kernel Functions
    Mahajan, Puneet
    [J]. 2020 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2020,
  • [43] Cognito: Automated Feature Engineering for Supervised Learning
    Khurana, Udayan
    Turaga, Deepak
    Samulowitz, Horst
    Parthasrathy, Srinivasan
    [J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 1304 - 1307
  • [44] SVM classifier incorporating feature selection using GA for spam detection
    Wang, HB
    Yu, Y
    Liu, Z
    [J]. EMBEDDED AND UBIQUITOUS COMPUTING - EUC 2005, 2005, 3824 : 1147 - 1154
  • [45] The Role of Feature Selection in Machine Learning for Detection of Spam and Phishing Attacks
    Salihovic, Ina
    Serdarevic, Haris
    Kevric, Jasmin
    [J]. ADVANCED TECHNOLOGIES, SYSTEMS, AND APPLICATIONS III, VOL 2, 2019, 60 : 476 - 483
  • [46] Term Space Partition Based Ensemble Feature Construction for Spam Detection
    Mi, Guyue
    Gao, Yang
    Tan, Ying
    [J]. DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 205 - 216
  • [47] Deep feature fusion for cold-start spam review detection
    Xiang, Lingyun
    You, Huiqing
    Guo, Guoqing
    Li, Qian
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (01): : 419 - 434
  • [48] Comparative Study of Feature Reduction and Machine Learning Methods for Spam Detection
    Agarwal, Basant
    Mittal, Namita
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2012), 2014, 236 : 761 - 769
  • [49] Variable Length Concentration based Feature Construction Method for Spam Detection
    Gao, Yang
    Mi, Guyue
    Tan, Ying
    [J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [50] Email Spam Detection Using Machine Learning and Feature Optimization Method
    Grewal, Naseeb
    Nijhawan, Rahul
    Mittal, Ankush
    [J]. DISTRIBUTED COMPUTING AND OPTIMIZATION TECHNIQUES, ICDCOT 2021, 2022, 903 : 435 - 447