A new hybrid ensemble feature selection framework for machine learning-based phishing detection system

被引:168
|
作者
Chiew, Kang Leng [1 ]
Tan, Choon Lin [1 ]
Wong, KokSheik [2 ]
Yong, Kelvin S. C. [3 ]
Tiong, Wei King [1 ]
机构
[1] Univ Malaysia Sarawak, Fac Comp Sci & Informat Technol, Kota Samarahan 94300, Sarawak, Malaysia
[2] Monash Univ Malaysia, Sch Informat Technol, Bandar Sunway 47500, Selangor, Malaysia
[3] Curtin Univ, Dept Elect & Comp Engn, Fac Engn & Sci, CDT 250, Miri 98009, Sarawak, Malaysia
关键词
Phishing detection; Feature selection; Machine learning; Ensemble-based; Classification; Phishing dataset; CLASSIFICATION;
D O I
10.1016/j.ins.2019.01.064
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new feature selection framework for machine learning-based phishing detection system, called the Hybrid Ensemble Feature Selection (HEFS). In the first phase of HEFS, a novel Cumulative Distribution Function gradient (CDF-g) algorithm is exploited to produce primary feature subsets, which are then fed into a data perturbation ensemble to yield secondary feature subsets. The second phase derives a set of baseline features from the secondary feature subsets by using a function perturbation ensemble. The overall experimental results suggest that HEFS performs best when it is integrated with Random Forest classifier, where the baseline features correctly distinguish 94.6% of phishing and legitimate websites using only 20.8% of the original features. In another experiment, the baseline features (10 in total) utilised on Random Forest outperforms the set of all features (48 in total) used on SVM, Naive Bayes, C4.5, JRip, and PART classifiers. HEFS also shows promising results when benchmarked using another well-known phishing dataset from the University of California Irvine (UCI) repository. Hence, the HEFS is a highly desirable and practical feature selection technique for machine learning-based phishing detection systems. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:153 / 166
页数:14
相关论文
共 50 条
  • [21] An Optimized Bagging Learning with Ensemble Feature Selection Method for URL Phishing Detection
    Ponni Ponnusamy
    Prabha Dhandayudam
    Journal of Electrical Engineering & Technology, 2024, 19 : 1881 - 1889
  • [22] Phishing Website Detection: An Improved Accuracy through Feature Selection and Ensemble Learning
    Ubing, Alyssa Anne
    Jasmi, Syukrina Kamilia Binti
    Abdullah, Azween
    Jhanjhi, N. Z.
    Supramaniam, Mahadevan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (01) : 252 - 257
  • [23] The Role of Feature Selection in Machine Learning for Detection of Spam and Phishing Attacks
    Salihovic, Ina
    Serdarevic, Haris
    Kevric, Jasmin
    ADVANCED TECHNOLOGIES, SYSTEMS, AND APPLICATIONS III, VOL 2, 2019, 60 : 476 - 483
  • [24] An Optimized Bagging Learning with Ensemble Feature Selection Method for URL Phishing Detection
    Ponnusamy, Ponni
    Dhandayudam, Prabha
    JOURNAL OF ELECTRICAL ENGINEERING & TECHNOLOGY, 2024, 19 (03) : 1881 - 1889
  • [25] Intrusion Detection System with an Ensemble Learning and Feature Selection Framework for IoT Networks
    Rohini, G.
    Gnana Kousalya, C.
    Bino, J.
    IETE JOURNAL OF RESEARCH, 2023, 69 (12) : 8859 - 8875
  • [26] Machine Learning-Based Feature Extraction and Selection
    Ruano-Ordas, David
    APPLIED SCIENCES-BASEL, 2024, 14 (15):
  • [27] Hybrid Feature Selection for Phishing Email Detection
    Hamid, Isredza Rahmi A.
    Abawajy, Jemal
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PT II, 2011, 7017 : 266 - 275
  • [28] An Evaluation of Machine Learning-Based Methods for Detection of Phishing Sites
    Miyamoto, Daisuke
    Hazeyama, Hiroaki
    Kadobayashi, Youki
    ADVANCES IN NEURO-INFORMATION PROCESSING, PT I, 2009, 5506 : 539 - 546
  • [29] A new hybrid deep learning-based phishing detection system using MCS-DNN classifier
    J. Anitha
    M. Kalaiarasu
    Neural Computing and Applications, 2022, 34 : 5867 - 5882
  • [30] A Survey of Machine Learning-Based Solutions for Phishing Website Detection
    Tang, Lizhen
    Mahmoud, Qusay H.
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2021, 3 (03): : 672 - 694