Optimization of Phishing Website Classification Based on Synthetic Minority Oversampling Technique and Feature Selection

被引:0
|
作者
Prayogo, Rizal Dwi [1 ]
Karimah, Siti Amatullah [2 ]
机构
[1] Inst Teknol Bandung, Sch Elect Engn & Informat, Bandung, Indonesia
[2] Telkom Univ, Sch Comp, Bandung, Indonesia
来源
2020 5TH INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS 2020) | 2020年
关键词
Class imbalance problem; feature selection; K-Nearest Neighbor; phishing website classification; SMOTE; SMOTE;
D O I
10.1109/iwbis50925.2020.9255562
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a new approach for optimizing phishing website classification based on Synthetic Minority Over-sampling Technique (SMOTE) together with feature selection. Classification is a kind of supervised machine learning technique that learns based on the features to identify the class. However, not all features are relevant to identify phishing websites and the class imbalance problem leads to suboptimal performances. Therefore, we propose SMOTE for handling the class imbalance problem by generating new synthetic instances for the minority class. Filter-based feature selection using Information Gain and Correlation are proposed for reducing irrelevant features. The classification performances are evaluated using K-Nearest Neighbor (KNN) classifier. The results demonstrate that SMOTE effectively increases the classification performances in terms of accuracy, precision, recall, and F-measure with more time-efficient. The performance of SMOTE combined with feature selection is validated and benchmarked with different techniques both on full features and reduced features. The results demonstrate that our proposed technique presents the highest accuracy, i.e. 97.47% on full features and 94.87% on reduced features. Hence, our proposed technique is promising in optimizing phishing website classification.
引用
收藏
页码:125 / 130
页数:6
相关论文
共 50 条