A novel phishing website classification method based on hybrid sampling

被引:0
|
作者
Srivastava, Jaya [1 ]
Sharan, Aditi [2 ]
机构
[1] Computer Services Centre, Indian Institute of Technology Delhi, New Delhi, India
[2] Jawaharlal Nehru University, New Delhi, India
关键词
Anomaly detection - Classification (of information) - Computer crime - Cybersecurity - Logistic regression - Support vector machines;
D O I
10.1080/23742917.2023.2240606
中图分类号
学科分类号
摘要
In real-world anomaly detection tasks such as Credit Card Fraud Detection, Cancer Patients Detection, Phishing Website Detection, etc., the training datasets often suffer from skewed class distribution. But the traditional Machine Learning (ML) classification algorithms assume balanced class distribution and equal misclassification costs. As a result, when class-imbalanced data are presented to the traditional ML algorithms they tend to produce biased and inaccurate predictive ML models. In this study, we propose four novel Phishing Website Classification models namely, SMOTEENN-XGB, SMOTEENN-RF, SMOTEENN-LR, and SMOTEENN-SVM by combining SMOTEENN (SMOTE + ENN) hybrid sampling technique with eXtreme Gradient Boosting (XGB), Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM) classifiers respectively. We propose the use of SMOTEENN hybrid sampling as the novel approach to address the problem of class imbalance in Phishing Website datasets prior to building classification models. To the best of our knowledge and belief, our novel proposed four models SMOTEENN-XGB, SMOTEENN-RF, SMOTEEEN-LR, and SMOTEENN-SVM for Phishing Website Detection based on SMOTEENN hybrid sampling approach have not been published in the existing studies as of now. © 2023 Informa UK Limited, trading as Taylor & Francis Group.
引用
收藏
页码:1 / 30
相关论文
共 50 条
  • [41] Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting
    Ali, Waleed
    Ahmed, Adel A.
    IET INFORMATION SECURITY, 2019, 13 (06) : 659 - 669
  • [42] An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling
    Gao, Xin
    Ren, Bing
    Zhang, Hao
    Sun, Bohao
    Li, Junliang
    Xu, Jianhang
    He, Yang
    Li, Kangsheng
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 160
  • [43] CVAE-Based Hybrid Sampling Data Augmentation Method and Interpretation for Imbalanced Classification of Gout Disease
    Si, Xiaonan
    Fu, Yifan
    Liu, Xinran
    Wang, Rulin
    Xu, Wenchang
    Wang, Lei
    ADVANCED INTELLIGENT COMPUTING IN BIOINFORMATICS, PT I, ICIC 2024, 2024, 14881 : 49 - 60
  • [44] A Sparse Sampling Method for Classification Based on Likelihood Factor
    Ding, Linge
    Sun, Fuchun
    Wang, Hongqiao
    Chen, Ning
    ADVANCES IN NEURAL NETWORKS - ISNN 2008, PT 2, PROCEEDINGS, 2008, 5264 : 268 - 275
  • [45] A novel mail filtering method against Phishing
    Inomata, A
    Rahman, M
    Okamoto, T
    Okamoto, E
    2005 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2005, : 221 - 224
  • [46] Novel 'hybrid' classification method employing Bayesian networks
    Mello, KL
    Brown, SD
    JOURNAL OF CHEMOMETRICS, 1999, 13 (06) : 579 - 590
  • [47] Enhancing Website Fraud Detection: A ChatGPT-Based Approach to Phishing Detection
    Schesny, Michael
    Lutz, Nico
    Jaegle, Thomas
    Gerschner, Felix
    Klaiber, Marco
    Theissler, Andreas
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 1494 - 1495
  • [48] A Comprehensive Survey on Identification and Analysis of Phishing Website based on Machine Learning Methods
    Alkawaz, Mohammed Hazim
    Steven, Stephanie Joanne
    Hajamydeen, Asif Iqbal
    Ramli, Rusyaizila
    11TH IEEE SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE 2021), 2021, : 82 - 87
  • [49] Rotation Forest-Based Logistic Model Tree for Website Phishing Detection
    Balogun, Abdullateef O.
    Akande, Noah O.
    Usman-Hamza, Fatimah E.
    Adeyemo, Victor E.
    Mabayoje, Modinat A.
    Ameen, Ahmed O.
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2021, PT IX, 2021, 12957 : 154 - 169
  • [50] A Novel Hybrid Method for Analog Circuit Fault Classification
    Zhang, Aihua
    Huang, Kailun
    Wang, Rui
    Zhang, Zhiqiang
    2017 6TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS (DDCLS), 2017, : 365 - 369