Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification

被引:0
|
作者
Belsis, Petros [1 ,2 ]
Fragos, Kostas [3 ]
Gritzalis, Stefanos [1 ]
Skourlas, Christos [2 ]
机构
[1] Univ Aegean, Dept Informat & Commun Syst Engn, Karlovassi 83200, Samos, Greece
[2] Technol Educ Inst Athens, Dept Informat, Egaleo 12210, Greece
[3] Natl Tech Univ Athens, Dept Elect & Comp Engn, Athens 15771, Greece
关键词
Spam mail; machine learning based processing; hierarchical mixtures of experts;
D O I
10.3233/JCS-2009-0377
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
E-mail abuse has been steadily increasing during the last decade. E-mail users find themselves targeted by massive quantities of unsolicited bulk e-mail, which often contains offensive language or has fraudulent intentions. Internet Service Providers (ISPs) on the other hand, have to face a considerable system overloading as the incoming mail consumes network and storage resources. Among the plethora of solutions, the most prominent in terms of cost efficiency and complexity are the text filtering approaches. Most of the approaches model the problem using linear statistical models. Despite their popularity-due both to their simplicity and relative ease of interpretation-the non-linearity assumption of data samples is inappropriate in practice. This is mainly due to the inability of other approaches to capture the apparent non-linear relationships, which characterize these samples. In this paper, we propose a margin-based feature selection approach integrated with a Hierarchical Mixtures of Experts (HME) system, which attempts to overcome limitations common to other machine-learning based approaches. By reducing the data dimensionality using effective algorithms for feature selection we evaluated our system with publicly available corpora of e-mails, characterized by very high similarity between legitimate and bulk e-mail (and thus low discriminative potential). We experimented with two different architectures, a hierarchical HME and a perceptron HME. As a result, we confirm the domination of our Spam Filtering (SF) HME method against other machine learning approaches, which present lesser degree of recall, as well as against traditional rule-based approaches, which lack considerably in the achieved degrees of precision.
引用
收藏
页码:239 / 268
页数:30
相关论文
共 50 条
  • [1] Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification
    Belsis, Petros
    Fragos, Kostas
    Gritzalis, Stefanos
    Skourlas, Christos
    [J]. JOURNAL OF COMPUTER SECURITY, 2008, 16 (06) : 761 - 790
  • [2] An effective feature selection method for web spam detection
    Asdaghi, Faeze
    Soleimani, Ali
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 166 : 198 - 206
  • [3] Hierarchical Classification and Regression with Feature Selection
    Ke, Shih-Wen
    Yeh, Chi-Wei
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEM), 2019, : 1150 - 1154
  • [4] An Evaluation on the Efficiency of Hybrid Feature Selection in Spam Email Classification
    Mohamad, Masurah
    Selamat, Ali
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATIONS, AND CONTROL TECHNOLOGY (I4CT), 2015,
  • [5] Text feature selection method for hierarchical classification
    Zhu, Cui-Ling
    Ma, Jun
    Zhang, Dong-Mei
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2011, 24 (01): : 103 - 110
  • [6] On improving the performance of spam filters using heuristic feature selection techniques
    Wang, Ren
    Youssef, Amr M.
    Elhakeem, Ahmed K.
    [J]. 2006 23RD BIENNIAL SYMPOSIUM ON COMMUNICATIONS, 2006, : 227 - +
  • [7] On the Utility of Power Spectral Techniques With Feature Selection Techniques for Effective Mental Task Classification in Noninvasive BCI
    Gupta, Akshansh
    Agrawal, Ramesh Kumar
    Kirar, Jyoti Singh
    Andreu-Perez, Javier
    Ding, Wei-Ping
    Lin, Chin-Teng
    Prasad, Mukesh
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (05): : 3080 - 3092
  • [8] A Bayesian approach to model selection in hierarchical mixtures-of-experts architectures
    Jacobs, RA
    Peng, FC
    Tanner, MA
    [J]. NEURAL NETWORKS, 1997, 10 (02) : 231 - 241
  • [9] Effective Feature Selection for Classification of Promoter Sequences
    Kouser, K.
    Lavanya, P. G.
    Rangarajan, Lalitha
    Kshitish, Acharya K.
    [J]. PLOS ONE, 2016, 11 (12):
  • [10] Effective feature selection technique for text classification
    Seetha, Hari
    Murty, M. Narasimha
    Saravanan, R.
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2015, 7 (03) : 165 - 184