A novel phishing website classification method based on hybrid sampling

被引:0
|
作者
Srivastava, Jaya [1 ]
Sharan, Aditi [2 ]
机构
[1] Computer Services Centre, Indian Institute of Technology Delhi, New Delhi, India
[2] Jawaharlal Nehru University, New Delhi, India
关键词
Anomaly detection - Classification (of information) - Computer crime - Cybersecurity - Logistic regression - Support vector machines;
D O I
10.1080/23742917.2023.2240606
中图分类号
学科分类号
摘要
In real-world anomaly detection tasks such as Credit Card Fraud Detection, Cancer Patients Detection, Phishing Website Detection, etc., the training datasets often suffer from skewed class distribution. But the traditional Machine Learning (ML) classification algorithms assume balanced class distribution and equal misclassification costs. As a result, when class-imbalanced data are presented to the traditional ML algorithms they tend to produce biased and inaccurate predictive ML models. In this study, we propose four novel Phishing Website Classification models namely, SMOTEENN-XGB, SMOTEENN-RF, SMOTEENN-LR, and SMOTEENN-SVM by combining SMOTEENN (SMOTE + ENN) hybrid sampling technique with eXtreme Gradient Boosting (XGB), Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM) classifiers respectively. We propose the use of SMOTEENN hybrid sampling as the novel approach to address the problem of class imbalance in Phishing Website datasets prior to building classification models. To the best of our knowledge and belief, our novel proposed four models SMOTEENN-XGB, SMOTEENN-RF, SMOTEEEN-LR, and SMOTEENN-SVM for Phishing Website Detection based on SMOTEENN hybrid sampling approach have not been published in the existing studies as of now. © 2023 Informa UK Limited, trading as Taylor & Francis Group.
引用
收藏
页码:1 / 30
相关论文
共 50 条
  • [21] SI-BBA – A novel phishing website detection based on Swarm intelligence with deep learning
    Pavan Kumar P.
    Jaya T.
    Rajendran V.
    Materials Today: Proceedings, 2023, 80 : 3129 - 3139
  • [22] Investigating the Effect Of Feature Selection and Dimensionality Reduction On Phishing Website Classification Problem
    Singh, Pradeep
    Jain, Niti
    Maini, Ambar
    2015 1ST INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING TECHNOLOGIES (NGCT), 2015, : 388 - 393
  • [23] A Deep Learning-Based Framework for Phishing Website Detection
    Tang, Lizhen
    Mahmoud, Qusay H.
    IEEE ACCESS, 2022, 10 : 1509 - 1521
  • [24] Phishing website detection based on effective machine learning approach
    Harinahalli Lokesh, Gururaj
    BoreGowda, Goutham
    Journal of Cyber Security Technology, 2021, 5 (01) : 1 - 14
  • [25] A Novel Hybrid-Jump-Based Sampling Method for Complex Social Networks
    Li, Lianggui
    Wang, Lingmin
    Wu, Wei
    Jia, Huiling
    Zhang, Yu
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2019, 6 (02) : 241 - 249
  • [26] A Novel CNN-KNN based Hybrid Method for Plant Classification
    Prasad, P. Siva
    Senthilrajan, A.
    JOURNAL OF ALGEBRAIC STATISTICS, 2022, 13 (01) : 498 - 502
  • [27] A Novel Video Classification Method Based on Hybrid Generative/Discriminative Models
    Zeng, Zhi
    Liang, Wei
    Li, Heping
    Zhang, Shuwu
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2008, 5342 : 705 - 713
  • [28] Improving the classification of phishing websites using a hybrid algorithm
    Sharma, Suvita Rani
    Singh, Birmohan
    Kaur, Manpreet
    COMPUTATIONAL INTELLIGENCE, 2022, 38 (02) : 667 - 689
  • [29] Development of Proposed Model Using Random Forest with Optimization Technique for Classification of Phishing Website
    Prakash Pathak
    Akhilesh Kumar Shrivas
    SN Computer Science, 5 (8)
  • [30] An Imbalanced Classification Method Based on Adaptive Sampling
    Chen Q.
    Xie J.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2022, 50 (04): : 26 - 34and45