Unsupervised feature learning for spam email filtering

被引:23
|
作者
Diale, Melvin [1 ,2 ]
Celik, Turgay [1 ]
Van Der Walt, Christiaan [2 ]
机构
[1] Univ Witwatersrand, Sch Comp Sci & Appl Math, 1 Jan Smuts Ave, ZA-2000 Johannesburg, South Africa
[2] Counsellor Sci & Ind Res Modelling & Digital Sci, Meiring Naude Rd, ZA-0001 Pretoria, South Africa
关键词
Feature learning; Autoencoder; Spam email filtering; Spam email detection; Cosine similarity; Natural language processing; Machine learning;
D O I
10.1016/j.compeleceng.2019.01.004
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
An excessive number of features may negatively affect the performance of a learning classifier. In addition, the computational time for processing the data during the training process may be prolonged. Therefore, a preprocessing stage that includes feature extraction and feature reduction processes in the field of machine learning is a vital role for speeding up computation and improving classification accuracy. The problem considered in this study is related to data transformation, prior to machine learning classifiers. Feature representation that preserves class separability with lower dimensional space for identifying spam is being proposed. The major advantage regarding the proposed feature representation is its robustness that enables classifiers like Random Forest, Support Vector Machines, and the decision tree C4.5 to identify an incoming email as spam or non-spam where the feature size is very small with a good generalization irrespective of the data source. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:89 / 104
页数:16
相关论文
共 50 条
  • [21] Spam filtering and email-mediated applications
    Li, Wenbin
    Zhong, Ning
    Yao, Y. Y.
    Liu, Jiming
    Liu, Chunnian
    WEB INTELLIGENCE MEETS BRAIN INFORMATICS, 2007, 4845 : 382 - 405
  • [22] Filtering spam email based on retry patterns
    Lieven, Peter
    Scheuermann, Bjoern
    Stini, Michael
    Mauve, Martin
    2007 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-14, 2007, : 1515 - 1520
  • [23] On extendable software architecture for spam email filtering
    Ma, Wanli
    Tran, Dat
    Sharma, Dharmendra
    IMECS 2007: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2007, : 924 - +
  • [24] Machine learning for email spam filtering: review, approaches and open research problems
    Dada, Emmanuel Gbenga
    Bassi, Joseph Stephen
    Chiroma, Haruna
    Abdulhamid, Shafi'i Muhammad
    Adetunmbi, Adebayo Olusola
    Ajibuwa, Opeyemi Emmanuel
    HELIYON, 2019, 5 (06)
  • [25] Online active multi-field learning for efficient email spam filtering
    Liu, Wuying
    Wang, Ting
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 33 (01) : 117 - 136
  • [26] Online active multi-field learning for efficient email spam filtering
    Wuying Liu
    Ting Wang
    Knowledge and Information Systems, 2012, 33 : 117 - 136
  • [27] Feature selection for spam filtering
    Menghour, Kamilia
    Souici-Meslati, Labiba
    CORIA 2010: Actes de la COnference en Recherche d'Information et Applications - Proceedings of the Conference on Information Retrieval and Applications, 2010, : 349 - 360
  • [28] ADAPTIVE PRIVACY POLICY PREDICTION FOR EMAIL SPAM FILTERING
    Rajendran, P.
    Hemalatha, S. M.
    Janaki, M.
    Durkananthini, B.
    2016 WORLD CONFERENCE ON FUTURISTIC TRENDS IN RESEARCH AND INNOVATION FOR SOCIAL WELFARE (STARTUP CONCLAVE), 2016,
  • [29] Efficient spam email filtering using adaptive ontology
    Youn, Seongwook
    McLeod, Dennis
    INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2007, : 249 - +
  • [30] A Collaborative Abstraction Based Email Spam Filtering with Fingerprints
    Rajendran, P.
    Tamilarasi, A.
    Mynavathi, R.
    WIRELESS PERSONAL COMMUNICATIONS, 2022, 123 (02) : 1913 - 1923