Unsupervised feature learning for spam email filtering

被引:23
|
作者
Diale, Melvin [1 ,2 ]
Celik, Turgay [1 ]
Van Der Walt, Christiaan [2 ]
机构
[1] Univ Witwatersrand, Sch Comp Sci & Appl Math, 1 Jan Smuts Ave, ZA-2000 Johannesburg, South Africa
[2] Counsellor Sci & Ind Res Modelling & Digital Sci, Meiring Naude Rd, ZA-0001 Pretoria, South Africa
关键词
Feature learning; Autoencoder; Spam email filtering; Spam email detection; Cosine similarity; Natural language processing; Machine learning;
D O I
10.1016/j.compeleceng.2019.01.004
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
An excessive number of features may negatively affect the performance of a learning classifier. In addition, the computational time for processing the data during the training process may be prolonged. Therefore, a preprocessing stage that includes feature extraction and feature reduction processes in the field of machine learning is a vital role for speeding up computation and improving classification accuracy. The problem considered in this study is related to data transformation, prior to machine learning classifiers. Feature representation that preserves class separability with lower dimensional space for identifying spam is being proposed. The major advantage regarding the proposed feature representation is its robustness that enables classifiers like Random Forest, Support Vector Machines, and the decision tree C4.5 to identify an incoming email as spam or non-spam where the feature size is very small with a good generalization irrespective of the data source. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:89 / 104
页数:16
相关论文
共 50 条
  • [1] Efficient Feature Set for Spam Email Filtering
    Varghese, Reshma
    Dhanya, K. A.
    2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 732 - 737
  • [2] Unsupervised Approach for Email Spam Filtering using Data Mining
    Manaa M.E.
    Obaid A.J.
    Dosh M.H.
    EAI Endorsed Transactions on Energy Web, 2021, 8 (36) : 1 - 6
  • [4] Structured ensemble learning for email spam filtering
    Liu, W. (wyliu@nudt.edu.cn), 2012, Science Press (49):
  • [5] Email Spam Filtering
    Puertas Sanz, Enrique
    Gomez Hidalgo, Jose Maria
    Cortizo Perez, Jose Carlos
    ADVANCES IN COMPUTERS, VOL 74: SOFTWARE DEVELOPMENT, 2008, 74 : 45 - 114
  • [6] Multi-field Learning for Email Spam Filtering
    Liu, Wuying
    Wang, Ting
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 745 - 746
  • [7] Feature Selection and Similarity Coefficient Based Method for Email Spam Filtering
    Abdelrahim, Ali Ahmed A.
    Elhadi, Ammar Ahmed E.
    Ibrahim, Hamza
    Elmisbah, Naser
    2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONICS ENGINEERING (ICCEEE), 2013, : 630 - 633
  • [8] Comparison of Deep and Traditional Learning Methods for Email Spam Filtering
    Sheneamer, Abdullah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (01) : 560 - 565
  • [9] A survey of learning-based techniques of email spam filtering
    Enrico Blanzieri
    Anton Bryl
    Artificial Intelligence Review, 2008, 29 : 63 - 92
  • [10] A survey of learning-based techniques of email spam filtering
    Blanzieri, Enrico
    Bryl, Anton
    ARTIFICIAL INTELLIGENCE REVIEW, 2008, 29 (01) : 63 - 92