Unsupervised feature learning for spam email filtering

被引:23
|
作者
Diale, Melvin [1 ,2 ]
Celik, Turgay [1 ]
Van Der Walt, Christiaan [2 ]
机构
[1] Univ Witwatersrand, Sch Comp Sci & Appl Math, 1 Jan Smuts Ave, ZA-2000 Johannesburg, South Africa
[2] Counsellor Sci & Ind Res Modelling & Digital Sci, Meiring Naude Rd, ZA-0001 Pretoria, South Africa
关键词
Feature learning; Autoencoder; Spam email filtering; Spam email detection; Cosine similarity; Natural language processing; Machine learning;
D O I
10.1016/j.compeleceng.2019.01.004
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
An excessive number of features may negatively affect the performance of a learning classifier. In addition, the computational time for processing the data during the training process may be prolonged. Therefore, a preprocessing stage that includes feature extraction and feature reduction processes in the field of machine learning is a vital role for speeding up computation and improving classification accuracy. The problem considered in this study is related to data transformation, prior to machine learning classifiers. Feature representation that preserves class separability with lower dimensional space for identifying spam is being proposed. The major advantage regarding the proposed feature representation is its robustness that enables classifiers like Random Forest, Support Vector Machines, and the decision tree C4.5 to identify an incoming email as spam or non-spam where the feature size is very small with a good generalization irrespective of the data source. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:89 / 104
页数:16
相关论文
共 50 条
  • [41] On the Relative Age of Spam and Ham Training Samples for Email Filtering
    Cormack, Gordon V.
    da Cruz, Jose-Marcio Martins
    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 744 - 745
  • [42] Spam Email Filtering Using Network-Level Properties
    Cortez, Paulo
    Correia, Andre
    Sousa, Pedro
    Rocha, Miguel
    Rio, Miguel
    ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS, 2010, 6171 : 476 - +
  • [43] Unsupervised Feature Selection for Spherical Data Modeling: Application to Image-Based Spam Filtering
    Amayri, Ola
    Bouguila, Nizar
    MULTIMEDIA COMMUNICATIONS, SERVICES AND SECURITY, 2012, 287 : 13 - 23
  • [44] Filtering obfuscated email spam by means of phonetic string matching
    Freschi, Valerio
    Seraghiti, Andrea
    Bogliolo, Alessandro
    ADVANCES IN INFORMATION RETRIEVAL, 2006, 3936 : 505 - 509
  • [45] A Study of Neighbor Users Selection in Email Networks for Spam Filtering
    Wang, Yongchao
    Chao, Yuyan
    He, Lifeng
    ICCNS 2018: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON COMMUNICATION AND NETWORK SECURITY, 2018, : 22 - 26
  • [46] A suffix tree approach to anti-spam email filtering
    Pampapathi, Rajesh
    Mirkin, Boris
    Levene, Mark
    MACHINE LEARNING, 2006, 65 (01) : 309 - 338
  • [47] Email spam detection by deep learning models using novel feature selection technique and BERT
    Nasreen, Ghazala
    Khan, Muhammad Murad
    Younus, Muhammad
    Zafar, Bushra
    Hanif, Muhammad Kashif
    EGYPTIAN INFORMATICS JOURNAL, 2024, 26
  • [48] An Evaluation on the Efficiency of Hybrid Feature Selection in Spam Email Classification
    Mohamad, Masurah
    Selamat, Ali
    2015 2ND INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATIONS, AND CONTROL TECHNOLOGY (I4CT), 2015,
  • [49] Modified Relevance Frequency Feature Weighting for Email Spam Classification
    Adji, Teguh Bharata
    Taufikurrahman, Arief
    Setiawan, Noor Akhmad
    ADVANCES OF SCIENCE AND TECHNOLOGY FOR SOCIETY, 2016, 1755
  • [50] Cost-sensitive three-way email spam filtering
    Bing Zhou
    Yiyu Yao
    Jigang Luo
    Journal of Intelligent Information Systems, 2014, 42 : 19 - 45