Towards Automated Comprehensive Feature Engineering for Spam Detection

被引:2
|
作者
Kiwanuka, Fred N. [1 ]
Alqatawna, Ja'far [1 ,2 ]
Amin, Anang Hudaya Muhamad [1 ]
Paul, Sujni [1 ]
Faris, Hossam [2 ]
机构
[1] Higher Coll Technol, Comp Informat Sci, Dubai, U Arab Emirates
[2] Univ Jordan, King Abdullah II Sch Informat Technol, Amman, Jordan
关键词
Spam Detection; Dataset Processing; Automated Feature Engineering; Classification; Spam Features; Data Mining; Machine Learning; !text type='Python']Python[!/text] E-mail Feature Extraction and Classification Tool (CPyEFECT); CLASSIFICATION;
D O I
10.5220/0007393004290437
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Everyday billions of emails are passed or processed through online servers of which about 59% is spam according to a recent research. Spam emails have increasingly contained viruses or other harmful malware and are a security risk to computer systems. The importance of spam filtering and the security of computer systems has become more essential than ever. The rate of evolution of spam nowadays is so high and hence previously successful spam detection methods are failing to cope. In this paper, we propose a comprehensive and automated feature engineering framework for spam classification. The proposed framework enables first, the development of a large number of features from any email corpus, and second extracting automated features using feature transformation and aggregation primitives. We show that the performance of classification of spam improves between 2% to 28% for almost all conventional machine learning classifiers when using automated feature engineering. As a by product of our comprehensive automated feature engineering, we develop a Python-based open source tool, which incorporates the proposed framework.
引用
收藏
页码:429 / 437
页数:9
相关论文
共 50 条
  • [1] Automated feature engineering for HTTP tunnel detection
    Davis, Jonathan J.
    Foo, Ernest
    [J]. COMPUTERS & SECURITY, 2016, 59 : 166 - 185
  • [2] Improving Email Spam Detection Using Content Based Feature Engineering Approach
    Hijawi, Wadi'
    Faris, Hossam
    Alqatawna, Ja'far
    Al-Zoubi, Ala' M.
    Aljarah, Ibrahim
    [J]. 2017 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT), 2017,
  • [3] Web spam detection with feature fusion
    Geng, Guanggang
    Zhu, Pengfei
    Wang, Deliang
    [J]. Journal of Computational Information Systems, 2009, 5 (03): : 1511 - 1519
  • [4] Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs
    Lucas, Yvan
    Portier, Pierre-Edouard
    Laporte, Lea
    He-Guelton, Liyun
    Caelen, Olivier
    Granitzer, Michael
    Calabretto, Sylvie
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 102 : 393 - 402
  • [5] A semantic-based model with a hybrid feature engineering process for accurate spam detection
    Chira N. Mohammed
    Ayah M. Ahmed
    [J]. Journal of Electrical Systems and Information Technology, 11 (1)
  • [6] An Automated Feature Engineering Method for Online Payment Fraud Detection
    Wang C.
    Wang C.-Q.
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2020, 43 (10): : 1983 - 2001
  • [7] Towards Online Review Spam Detection
    Lin, Yuming
    Zhu, Tao
    Wang, Xiaoling
    Zhang, Jingwei
    Zhou, Aoying
    [J]. WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 341 - 342
  • [8] Opinion Spam Detection Using Feature Selection
    Patel, Rinki
    Thakkar, Priyank
    [J]. 2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 560 - 564
  • [9] Dynamic Feature Selection for Spam Detection in Twitter
    Karakasli, M. Salih
    Aydin, Muhammed Ali
    Yarkan, Serhan
    Boyaci, Ali
    [J]. INTERNATIONAL TELECOMMUNICATIONS CONFERENCE, ITELCON 2017, 2019, 504 : 239 - 250
  • [10] A Comprehensive Study of Email Spam Botnet Detection
    Khan, Wazir Zada
    Khan, Muhammad Khurram
    Bin Muhaya, Fahad T.
    Aalsalem, Mohammed Y.
    Chao, Han-Chieh
    [J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2015, 17 (04): : 2271 - 2295