Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification

被引:0
|
作者
Belsis, Petros [1 ,2 ]
Fragos, Kostas [3 ]
Gritzalis, Stefanos [1 ]
Skourlas, Christos [2 ]
机构
[1] Univ Aegean, Dept Informat & Commun Syst Engn, Samos 83200, Greece
[2] Technol Educ Inst Athens, Dept Informat, Egaleo 12210, Greece
[3] Natl Tech Univ Athens, Dept Elect & Comp Engn, Athens 15771, Greece
关键词
Spam mail; machine learning based processing; hierarchical mixtures of experts;
D O I
10.3233/JCS-2008-0319
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
E-mail abuse has been steadily increasing during the last decade. E-mail users find themselves targeted by massive quantities of unsolicited bulk e-mail, which often contains offensive language or has fraudulent intentions. Internet Service Providers (ISPs) on the other hand, have to face a considerable system overloading as the incoming mail consumes network and storage resources. Among the plethora of solutions, the most prominent in terms of cost efficiency and complexity are the text filtering approaches. Most of the approaches model the problem using linear statistical models. Despite their popularity - due both to their simplicity and relative ease of interpretation - the non-linearity assumption of data samples is inappropriate in practice. This is mainly due to the inability of other approaches to capture the apparent non-linear relationships, which characterize these samples. In this paper, we propose a margin-based feature selection approach integrated with a Hierarchical Mixtures of Experts (HME) system, which attempts to overcome limitations common to other machine-learning based approaches. By reducing the data dimensionality using effective algorithms for feature selection we evaluated our system with publicly available corpora of e-mails, characterized by very high similarity between legitimate and bulk e-mail (and thus low discriminative potential). We experimented with two different architectures, a hierarchical HME and a perceptron HME. As a result, we confirm the domination of our Spam Filtering (SF) HME method against other machine learning approaches, which present lesser degree of recall, as well as against traditional rule-based approaches, which lack considerably in the achieved degrees of precision.
引用
收藏
页码:761 / 790
页数:30
相关论文
共 50 条
  • [21] Legitimate and spam SMS classification employing novel Ensemble feature selection algorithm
    Kumar, Shailender
    Gupta, Shweta
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 19897 - 19927
  • [22] XAI-based Feature Selection for SMS Spam Classification in Dravidian Languages
    Thirumalai, K. G.
    Prakash, K. Sakthi
    Abirami, A. M.
    Ramanujam, E.
    Sumitra, S.
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,
  • [23] Legitimate and spam SMS classification employing novel Ensemble feature selection algorithm
    Shailender Kumar
    Shweta Gupta
    [J]. Multimedia Tools and Applications, 2024, 83 : 19897 - 19927
  • [24] Regularized Maximum Likelihood Estimation and Feature Selection in Mixtures-of-Experts Models
    Chamroukhi, Faicel
    Huynh, Bao-Tuyen
    [J]. JOURNAL OF THE SFDS, 2019, 160 (01): : 57 - 85
  • [25] Regularized Estimation and Feature Selection in Mixtures of Gaussian-Gated Experts Models
    Chamroukhi, Faicel
    Lecocq, Florian
    Nguyen, Hien D.
    [J]. STATISTICS AND DATA SCIENCE, RSSDS 2019, 2019, 1150 : 42 - 56
  • [26] Effective Text Classification by a Supervised Feature Selection Approach
    Basu, Tanmay
    Murthy, C. A.
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 918 - 925
  • [27] mRMR plus : An Effective Feature Selection Algorithm for Classification
    Chowdhury, Hussain A.
    Bhattacharyya, Dhruba K.
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2017, 2017, 10597 : 424 - 430
  • [28] Effective classification using feature selection and fuzzy integration
    Pizzi, Nick J.
    Pedrycz, Witold
    [J]. FUZZY SETS AND SYSTEMS, 2008, 159 (21) : 2859 - 2872
  • [29] An Improved Feature Selection Based on Effective Range for Classification
    Wang, Jianzhong
    Zhou, Shuang
    Yi, Yugen
    Kong, Jun
    [J]. SCIENTIFIC WORLD JOURNAL, 2014,
  • [30] Improve Abstract Data with Feature Selection for Classification Techniques
    Nuipian, Vatinee
    Meesad, Phayung
    Boonrawd, Pudsadee
    [J]. FUTURE INFORMATION TECHNOLOGY, 2011, 13 : 213 - 217