A new hybrid ensemble feature selection framework for machine learning-based phishing detection system

被引:168
|
作者
Chiew, Kang Leng [1 ]
Tan, Choon Lin [1 ]
Wong, KokSheik [2 ]
Yong, Kelvin S. C. [3 ]
Tiong, Wei King [1 ]
机构
[1] Univ Malaysia Sarawak, Fac Comp Sci & Informat Technol, Kota Samarahan 94300, Sarawak, Malaysia
[2] Monash Univ Malaysia, Sch Informat Technol, Bandar Sunway 47500, Selangor, Malaysia
[3] Curtin Univ, Dept Elect & Comp Engn, Fac Engn & Sci, CDT 250, Miri 98009, Sarawak, Malaysia
关键词
Phishing detection; Feature selection; Machine learning; Ensemble-based; Classification; Phishing dataset; CLASSIFICATION;
D O I
10.1016/j.ins.2019.01.064
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new feature selection framework for machine learning-based phishing detection system, called the Hybrid Ensemble Feature Selection (HEFS). In the first phase of HEFS, a novel Cumulative Distribution Function gradient (CDF-g) algorithm is exploited to produce primary feature subsets, which are then fed into a data perturbation ensemble to yield secondary feature subsets. The second phase derives a set of baseline features from the secondary feature subsets by using a function perturbation ensemble. The overall experimental results suggest that HEFS performs best when it is integrated with Random Forest classifier, where the baseline features correctly distinguish 94.6% of phishing and legitimate websites using only 20.8% of the original features. In another experiment, the baseline features (10 in total) utilised on Random Forest outperforms the set of all features (48 in total) used on SVM, Naive Bayes, C4.5, JRip, and PART classifiers. HEFS also shows promising results when benchmarked using another well-known phishing dataset from the University of California Irvine (UCI) repository. Hence, the HEFS is a highly desirable and practical feature selection technique for machine learning-based phishing detection systems. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:153 / 166
页数:14
相关论文
共 50 条
  • [1] The hybrid framework of ensemble technique in machine learning for phishing detection
    Mahajan, Akanksha S.
    Navale, Pradnya K.
    Patil, Vaishnavi V.
    Khadse, Vijay M.
    Mahalle, Parikshit N.
    INTERNATIONAL JOURNAL OF INFORMATION AND COMPUTER SECURITY, 2023, 21 (1-2) : 162 - 184
  • [2] A New Ensemble Model for Phishing Detection Based on Hybrid Cumulative Feature Selection
    Prince, Md Sirajum Munir
    Hasan, Asib
    Shah, Faisal Muhammad
    11TH IEEE SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE 2021), 2021, : 7 - 12
  • [3] Phishing detection based on machine learning and feature selection methods
    Almseidin M.
    Abu Zuraiq A.M.
    Al-kasassbeh M.
    Alnidami N.
    International Journal of Interactive Mobile Technologies, 2019, 13 (12) : 71 - 183
  • [4] Feature Selection Approach for Phishing Detection Based on Machine Learning
    Wei, Yi
    Sekiya, Yuji
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON APPLIED CYBER SECURITY (ACS) 2021, 2022, 378 : 61 - 70
  • [5] Ensemble learning-based feature selection for phosphorylation site detection
    Liu, Songbo
    Cui, Chengmin
    Chen, Huipeng
    Liu, Tong
    FRONTIERS IN GENETICS, 2022, 13
  • [6] Enhancing Arabic Phishing Email Detection: A Hybrid Machine Learning Based on Genetic Algorithm Feature Selection
    Alsuwaylimi, Amjad A.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (08) : 312 - 325
  • [7] Machine learning-based phishing attack detection
    Hossain S.
    Sarma D.
    Chakma R.J.
    International Journal of Advanced Computer Science and Applications, 2020, 11 (09): : 378 - 388
  • [8] Machine Learning-Based Phishing Attack Detection
    Hossain, Sohrab
    Sarma, Dhiman
    Chakma, Rana Joyti
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (09) : 378 - 388
  • [9] Single and Hybrid-Ensemble Learning-Based Phishing Website Detection: Examining Impacts of Varied Nature Datasets and Informative Feature Selection Technique
    Adane, Kibreab
    Beyene, Berhanu
    Abebe, Mohammed
    DIGITAL THREATS: RESEARCH AND PRACTICE, 2023, 4 (03):
  • [10] Enhancing IoT Botnet Detection through Machine Learning-based Feature Selection and Ensemble Models
    Sharma, Ravi
    Din, Saika Mohi Ud
    Sharma, Nonita
    Kumar, Arun
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (02) : 1 - 6