On Feature Selection for the Prediction of Phishing Websites

被引:2
|
作者
Fadheel, Wesam [1 ]
Abusharkh, Mohamed [2 ]
Abdel-Qader, Ikhlas [3 ]
机构
[1] Western Michigan Univ, Dept Comp Sci, Kalamazoo, MI 49008 USA
[2] Ferris State Univ, Sch Digital Media, Grand Rapids, MI USA
[3] Western Michigan Univ, Dept Elect & Comp Engn, Kalamazoo, MI 49008 USA
关键词
D O I
10.1109/DASC-PICom-DataCom-CyberSciTec.2017.146
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
with the rise of the big data paradigm, large data sets are being made available for knowledge mining. While this open up possibilities for new insights being gained every day, it also exposes data consumers to an increase in low quality, unreliable, redundant or noisy portions of the data. This would negatively affect the process of harvesting knowledge and recognizing patterns. Therefore, efficient feature selection methods to empower for real-time prediction or classification systems. Feature selection is the process of identifying the most relevant attributes and removing the redundant and irrelevant attributes. In this study, we implemented Kaiser-Meyer-Olkin (KMO) Test as a feature selection method and applied that to a publicly available phishing dataset, namely, the UCI of phishing website. furthermore, we used Logistic Regression and Support Vector Machine as classification methods to validate the feature selection method. Our results show just a slight difference in accuracy between implementation using full dataset features and the proposed much smaller dataset (almost 63% of original features set). This reduction in dimensionality is significant for the real-time systems especially when the accuracy reduction is slight. From there, we present a framework enabling a significant reduction in features. This opens the door for future work under which a wider set of classification algorithms will be tested in order to achieve the dimensionality reduction and an increase in performance accuracy.
引用
收藏
页码:871 / 876
页数:6
相关论文
共 50 条
  • [41] Real Time Detection of Phishing Websites
    Ahmed, Abdulghani Ali
    Abdullah, Nurul Amirah
    7TH IEEE ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE IEEE IEMCON-2016, 2016,
  • [42] CCrFS: Combine Correlation Features Selection for Detecting Phishing Websites Using Machine Learning
    Moedjahedy, Jimmy
    Setyanto, Arief
    Alarfaj, Fawaz Khaled
    Alreshoodi, Mohammed
    FUTURE INTERNET, 2022, 14 (08)
  • [44] Improving the Feature Section Method Based on Genetic Algorithm to Increase the Efficiency of Detecting Phishing Websites
    Davoudi, Mohamad Reza
    Yari, Ali Reza
    AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2023, 57 (03) : 213 - 221
  • [45] Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques
    Das Guptta S.
    Shahriar K.T.
    Alqahtani H.
    Alsalman D.
    Sarker I.H.
    Annals of Data Science, 2024, 11 (01) : 217 - 242
  • [46] Certified Phishing: Taking a Look at Public Key Certificates of Phishing Websites
    Drury, Vincent
    Meyer, Ulrike
    PROCEEDINGS OF THE FIFTEENTH SYMPOSIUM ON USABLE PRIVACY AND SECURITY (SOUPS 2019), 2019, : 211 - 223
  • [47] An Approach to the Implementation of the Anti-Phishing Tool for Phishing Websites Detection
    Alnajim, Abdullah
    Munro, Malcolm
    2009 INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS (INCOS 2009), 2009, : 105 - +
  • [48] Detecting phishing pages using the relief feature selection and multiple classifiers
    Javadi-Moghaddam, Seyyed-Mohammad
    Golami, Mohammad
    INTERNATIONAL JOURNAL OF ELECTRONIC SECURITY AND DIGITAL FORENSICS, 2020, 12 (02) : 229 - 242
  • [49] Using Feature Selection and Classification Scheme for Automating Phishing Email Detection
    Hamid, Isredza Rahmi A.
    Abawajy, Jemal
    Kim, Tai-hoon
    STUDIES IN INFORMATICS AND CONTROL, 2013, 22 (01): : 61 - 70
  • [50] The Role of Feature Selection in Machine Learning for Detection of Spam and Phishing Attacks
    Salihovic, Ina
    Serdarevic, Haris
    Kevric, Jasmin
    ADVANCED TECHNOLOGIES, SYSTEMS, AND APPLICATIONS III, VOL 2, 2019, 60 : 476 - 483