A novel phishing website classification method based on hybrid sampling

被引:0
|
作者
Srivastava, Jaya [1 ]
Sharan, Aditi [2 ]
机构
[1] Computer Services Centre, Indian Institute of Technology Delhi, New Delhi, India
[2] Jawaharlal Nehru University, New Delhi, India
关键词
Anomaly detection - Classification (of information) - Computer crime - Cybersecurity - Logistic regression - Support vector machines;
D O I
10.1080/23742917.2023.2240606
中图分类号
学科分类号
摘要
In real-world anomaly detection tasks such as Credit Card Fraud Detection, Cancer Patients Detection, Phishing Website Detection, etc., the training datasets often suffer from skewed class distribution. But the traditional Machine Learning (ML) classification algorithms assume balanced class distribution and equal misclassification costs. As a result, when class-imbalanced data are presented to the traditional ML algorithms they tend to produce biased and inaccurate predictive ML models. In this study, we propose four novel Phishing Website Classification models namely, SMOTEENN-XGB, SMOTEENN-RF, SMOTEENN-LR, and SMOTEENN-SVM by combining SMOTEENN (SMOTE + ENN) hybrid sampling technique with eXtreme Gradient Boosting (XGB), Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM) classifiers respectively. We propose the use of SMOTEENN hybrid sampling as the novel approach to address the problem of class imbalance in Phishing Website datasets prior to building classification models. To the best of our knowledge and belief, our novel proposed four models SMOTEENN-XGB, SMOTEENN-RF, SMOTEEEN-LR, and SMOTEENN-SVM for Phishing Website Detection based on SMOTEENN hybrid sampling approach have not been published in the existing studies as of now. © 2023 Informa UK Limited, trading as Taylor & Francis Group.
引用
收藏
页码:1 / 30
相关论文
共 50 条
  • [31] Phishing Website Detection Based on Effective CSS Features of Web Pages
    Mao, Jian
    Tian, Wenqian
    Li, Pei
    Wei, Tao
    Liang, Zhenkai
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2017, 2017, 10251 : 804 - 815
  • [32] A Novel Hybrid Classification Method Based on the Opposition-Based Seagull Optimization Algorithm
    Jiang, He
    Yang, Ye
    Ping, Weiying
    Dong, Yao
    IEEE ACCESS, 2020, 8 : 100778 - 100790
  • [33] A Survey of Machine Learning-Based Solutions for Phishing Website Detection
    Tang, Lizhen
    Mahmoud, Qusay H.
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2021, 3 (03): : 672 - 694
  • [34] Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning
    Yang, Peng
    Zhao, Guangzhen
    Zeng, Peng
    IEEE ACCESS, 2019, 7 : 15196 - 15209
  • [35] Attacking Logo-Based Phishing Website Detectors with Adversarial Perturbations
    Lee, Jehyun
    Xin, Zhe
    See, Melanie Ng Pei
    Sabharwal, Kanav
    Apruzzese, Giovanni
    Divakaran, Dinil Mon
    COMPUTER SECURITY - ESORICS 2023, PT III, 2024, 14346 : 162 - 182
  • [36] INTELLIGENT TREE-BASED ENSEMBLE APPROACHES FOR PHISHING WEBSITE DETECTION
    Alsariera, Yazan A.
    Balogun, Abdullateef O.
    Adeyemo, Victor E.
    Tarawneh, Omar H.
    Mojeed, Hammed A.
    JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2022, 17 (01): : 563 - 582
  • [37] WikiPhish: A Diverse Wikipedia-Based Dataset for Phishing Website Detection
    Loiseau, Gabriel
    Lefils, Valentin
    Meyer, Maxime
    Riquet, Damien
    PROCEEDINGS OF THE FOURTEENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, CODASPY 2024, 2024, : 361 - 366
  • [38] Naive Bayes Method for Classification of Student Interest Based on Website Accessed
    Nazir, Alwis
    Akhyar, Amany
    Ramadhani, Muhammad
    Herlina
    UNIVERSITAS RIAU INTERNATIONAL CONFERENCE ON SCIENCE AND ENVIRONMENT 2020 (URICSE-2020), 2020, 1655
  • [39] HSDLM: A Hybrid Sampling With Deep Learning Method for Imbalanced Data Classification
    Hasib, Khan Md
    Towhid, Nurul Akter
    Islam, Md Rafiqul
    INTERNATIONAL JOURNAL OF CLOUD APPLICATIONS AND COMPUTING, 2021, 11 (04) : 1 - 13
  • [40] An Intelligent Rock Mass Classification Method based on Support Vector Machines and the Development of Website for Classification
    Niu, Wen-lin
    Li, Tian-bin
    INFORMATION TECHNOLOGY IN GEO-ENGINEERING, 2010, : 75 - 83