A high-accuracy phishing website detection method based on machine learning

被引:9
|
作者
Bahaghighat, Mahdi [1 ]
Ghasemi, Majid [1 ]
Ozen, Figen [2 ]
机构
[1] Imam Khomeini Int Univ, Dept Comp Engn, Qazvin, Iran
[2] Halic Univ, Istanbul, Turkiye
关键词
Phishing website detection; Cyber security; Machine learning; Classification; XGBoost;
D O I
10.1016/j.jisa.2023.103553
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid development of e-commerce, e-banking, and social networks has made phishing attack detection one of the most critical technologies in all cyber security systems. To improve the efficiency of anti-phishing techniques, we present an improved predictive model based on machine learning. The proposed method uses six different algorithms; Logistic Regression, K-Nearest Neighbors, Naive Bayes, Random Forest, Support Vector Machine, and Extreme Gradient Boosting (XGBoost). The experiments are based on a public dataset of 58,000 legitimate websites and 30,647 phishing ones, including 112 attributes for each sample. Our evaluations in the feature selection process show that after balancing the dataset and dropping constant features, a noticeable improvement can be achieved. We conducted our evaluation found on eight major unique scenarios. The experimental results of our phishing websites detection (PWD) method indicate remarkable performances in which each algorithm reached an accuracy of more than 93%, and the XGBoost classifier outperforms others with 99.2% overall accuracy, 99.1% precision, 99.4% recall, and 99.1% specificity. In addition, the study achieved optimal run-time of about 1500 ms for the XGBoost algorithm without dimension reduction while using Principal Component Analysis (PCA) reduces it down to just 869 ms. As a result, the proposed approach would be practical in both offline and real-time applications.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] A Website Defacement Detection Method Based on Machine Learning Techniques
    Xuan Dau Hoang
    PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY (SOICT 2018), 2018, : 443 - 448
  • [22] Phishing Website Detection Using Machine Learning Classifiers Optimized by Feature Selection
    Mehanovic, Dzelila
    Kevric, Jasmin
    TRAITEMENT DU SIGNAL, 2020, 37 (04) : 563 - 569
  • [23] Phishing website detection method based on logistic regression and XGBoost
    Yang P.
    Zeng P.
    Zhao G.
    Lü P.
    Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition), 2019, 49 (02): : 207 - 212
  • [24] KGhish: A Phishing Website Detection Method Based on Knowledge Graph
    Liu, Changlin
    Wang, Shanshan
    Chen, Zhenxiang
    Huang, Limei
    Li, Yan
    Li, Hanwen
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024, 2024, 14874 : 300 - 311
  • [25] Intrusion detection based on phishing detection with machine learning
    Jayaraj R.
    Pushpalatha A.
    Sangeetha K.
    Kamaleshwar T.
    Udhaya Shree S.
    Damodaran D.
    Measurement: Sensors, 2024, 31
  • [26] COMPARISON OF MACHINE LEARNING TECHNIQUES IN PHISHING WEBSITE CLASSIFICATION
    Hodzic, Adnan
    Kevric, Jasmin
    Karadag, Adem
    INTERNATIONAL CONFERENCE ON ECONOMIC AND SOCIAL STUDIES (ICESOS'16): REGIONAL ECONOMIC DEVELOPMENT: ENTREPNEURSHIP AND INNOVATION, 2016, : 249 - 256
  • [27] Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning
    Yang, Peng
    Zhao, Guangzhen
    Zeng, Peng
    IEEE ACCESS, 2019, 7 : 15196 - 15209
  • [28] An Interpretable High-Accuracy Method for Rice Disease Detection Based on Multisource Data and Transfer Learning
    Li, Jiaqi
    Zhao, Xinyan
    Xu, Hening
    Zhang, Liman
    Xie, Boyu
    Yan, Jin
    Zhang, Longchuang
    Fan, Dongchen
    Li, Lin
    PLANTS-BASEL, 2023, 12 (18):
  • [29] A Comprehensive Survey on Identification and Analysis of Phishing Website based on Machine Learning Methods
    Alkawaz, Mohammed Hazim
    Steven, Stephanie Joanne
    Hajamydeen, Asif Iqbal
    Ramli, Rusyaizila
    11TH IEEE SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE 2021), 2021, : 82 - 87
  • [30] Phishing Website Detection from URLs Using Classical Machine Learning ANN Model
    Salloum, Said
    Gaber, Tarek
    Vadera, Sunil
    Shaalan, Khaled
    SECURITY AND PRIVACY IN COMMUNICATION NETWORKS, SECURECOMM 2021, PT II, 2021, 399 : 509 - 523