Unsupervised feature learning for spam email filtering

被引:23
|
作者
Diale, Melvin [1 ,2 ]
Celik, Turgay [1 ]
Van Der Walt, Christiaan [2 ]
机构
[1] Univ Witwatersrand, Sch Comp Sci & Appl Math, 1 Jan Smuts Ave, ZA-2000 Johannesburg, South Africa
[2] Counsellor Sci & Ind Res Modelling & Digital Sci, Meiring Naude Rd, ZA-0001 Pretoria, South Africa
关键词
Feature learning; Autoencoder; Spam email filtering; Spam email detection; Cosine similarity; Natural language processing; Machine learning;
D O I
10.1016/j.compeleceng.2019.01.004
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
An excessive number of features may negatively affect the performance of a learning classifier. In addition, the computational time for processing the data during the training process may be prolonged. Therefore, a preprocessing stage that includes feature extraction and feature reduction processes in the field of machine learning is a vital role for speeding up computation and improving classification accuracy. The problem considered in this study is related to data transformation, prior to machine learning classifiers. Feature representation that preserves class separability with lower dimensional space for identifying spam is being proposed. The major advantage regarding the proposed feature representation is its robustness that enables classifiers like Random Forest, Support Vector Machines, and the decision tree C4.5 to identify an incoming email as spam or non-spam where the feature size is very small with a good generalization irrespective of the data source. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:89 / 104
页数:16
相关论文
共 50 条
  • [31] Adaptive email spam filtering based on information theory
    Zhang, Xin
    Dai, Wenyuan
    Xue, Gui-Rong
    Yu, Yong
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2007, PROCEEDINGS, 2007, 4831 : 159 - 170
  • [32] Email Spam Filtering using BPNN Classification Algorithm
    Tuteja, Simranjit Kaur
    Bogiri, Nagaraju
    2016 INTERNATIONAL CONFERENCE ON AUTOMATIC CONTROL AND DYNAMIC OPTIMIZATION TECHNIQUES (ICACDOT), 2016, : 915 - 919
  • [33] A Collaborative Abstraction Based Email Spam Filtering with Fingerprints
    P. Rajendran
    A. Tamilarasi
    R. Mynavathi
    Wireless Personal Communications, 2022, 123 : 1913 - 1923
  • [34] Improved machine learning technique for feature reduction and its application in spam email detection
    Ewees, Ahmed A.
    Gaheen, Marwa A.
    Alshahrani, Mohammed M.
    Anter, Ahmed M.
    Ismail, Fatma H.
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (06) : 1749 - 1771
  • [35] Application of Natural Language Processing and Machine Learning Boosted with Swarm Intelligence for Spam Email Filtering
    Bacanin, Nebojsa
    Zivkovic, Miodrag
    Stoean, Catalin
    Antonijevic, Milos
    Janicijevic, Stefana
    Sarac, Marko
    Strumberger, Ivana
    MATHEMATICS, 2022, 10 (22)
  • [36] Combining SVM classifiers for email anti-spam filtering
    Blanco, Angela
    Maria Ricket, Alba
    Martin-Merino, Manuel
    COMPUTATIONAL AND AMBIENT INTELLIGENCE, 2007, 4507 : 903 - +
  • [37] Email Filtering based on Supervised Learning and Mutual Information Feature Selection
    Gad, Walaa
    Rady, Sherine
    2015 TENTH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2015, : 147 - 152
  • [38] An email geographic path-based technique for spam filtering
    Jiang, Yu
    Zhang, Ni
    Fang, Binxing
    CIS: 2007 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PROCEEDINGS, 2007, : 750 - +
  • [39] A Three-Way Decision Approach to Email Spam Filtering
    Zhou, Bing
    Yao, Yiyu
    Luo, Jigang
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2010, 6085 : 28 - 39
  • [40] A suffix tree approach to anti-spam email filtering
    Rajesh Pampapathi
    Boris Mirkin
    Mark Levene
    Machine Learning, 2006, 65 : 309 - 338