On Feature Selection for the Prediction of Phishing Websites

被引:2
|
作者
Fadheel, Wesam [1 ]
Abusharkh, Mohamed [2 ]
Abdel-Qader, Ikhlas [3 ]
机构
[1] Western Michigan Univ, Dept Comp Sci, Kalamazoo, MI 49008 USA
[2] Ferris State Univ, Sch Digital Media, Grand Rapids, MI USA
[3] Western Michigan Univ, Dept Elect & Comp Engn, Kalamazoo, MI 49008 USA
关键词
D O I
10.1109/DASC-PICom-DataCom-CyberSciTec.2017.146
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
with the rise of the big data paradigm, large data sets are being made available for knowledge mining. While this open up possibilities for new insights being gained every day, it also exposes data consumers to an increase in low quality, unreliable, redundant or noisy portions of the data. This would negatively affect the process of harvesting knowledge and recognizing patterns. Therefore, efficient feature selection methods to empower for real-time prediction or classification systems. Feature selection is the process of identifying the most relevant attributes and removing the redundant and irrelevant attributes. In this study, we implemented Kaiser-Meyer-Olkin (KMO) Test as a feature selection method and applied that to a publicly available phishing dataset, namely, the UCI of phishing website. furthermore, we used Logistic Regression and Support Vector Machine as classification methods to validate the feature selection method. Our results show just a slight difference in accuracy between implementation using full dataset features and the proposed much smaller dataset (almost 63% of original features set). This reduction in dimensionality is significant for the real-time systems especially when the accuracy reduction is slight. From there, we present a framework enabling a significant reduction in features. This opens the door for future work under which a wider set of classification algorithms will be tested in order to achieve the dimensionality reduction and an increase in performance accuracy.
引用
收藏
页码:871 / 876
页数:6
相关论文
共 50 条
  • [31] Detection of phishing websites using an efficient feature-based machine learning framework
    Rao, Routhu Srinivasa
    Pais, Alwyn Roshan
    NEURAL COMPUTING & APPLICATIONS, 2019, 31 (08): : 3851 - 3873
  • [32] Addressing feature selection and extreme learning machine tuning by diversity-oriented social network search: an application for phishing websites detection
    Nebojsa Bacanin
    Miodrag Zivkovic
    Milos Antonijevic
    K. Venkatachalam
    Jinseok Lee
    Yunyoung Nam
    Marina Marjanovic
    Ivana Strumberger
    Mohamed Abouhawwash
    Complex & Intelligent Systems, 2023, 9 : 7269 - 7304
  • [33] Detecting Phishing Websites Using an Efficient Feature-based Machine Learning Framework
    Sundaram, K. Mohana
    Sasikumar, R.
    Meghana, Atthipalli Sai
    Anuja, Arava
    Praneetha, Chandolu
    REVISTA GEINTEC-GESTAO INOVACAO E TECNOLOGIAS, 2021, 11 (02): : 2106 - 2112
  • [34] PhiKitA: Phishing Kit Attacks Dataset for Phishing Websites Identification
    Castano, Felipe
    Fernandez, Eduardo Fidalgo
    Alaiz-Rodriguez, Rocio
    Alegre, Enrique
    IEEE ACCESS, 2023, 11 : 40779 - 40789
  • [35] Phishing detection based on machine learning and feature selection methods
    Almseidin M.
    Abu Zuraiq A.M.
    Al-kasassbeh M.
    Alnidami N.
    International Journal of Interactive Mobile Technologies, 2019, 13 (12) : 71 - 183
  • [36] Semantic Feature Selection for Text with Application to Phishing Email Detection
    Verma, Rakesh
    Hossain, Nabil
    INFORMATION SECURITY AND CRYPTOLOGY - ICISC 2013, 2014, 8565 : 455 - 468
  • [37] An Improved Genetic Algorithm for Web Phishing Detection Feature Selection
    Wang, Jiachen
    2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 130 - 134
  • [39] Phishing Email Detection Based on Binary Search Feature Selection
    Sonowal G.
    SN Computer Science, 2020, 1 (4)
  • [40] Feature Selection Approach for Phishing Detection Based on Machine Learning
    Wei, Yi
    Sekiya, Yuji
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON APPLIED CYBER SECURITY (ACS) 2021, 2022, 378 : 61 - 70