Handling Imbalanced Dataset Using SVM and k-NN Approach

被引:9
|
作者
Wah, Yap Bee [1 ]
Abd Rahman, Hezlin Aryani [1 ]
He, Haibo [2 ,3 ]
Bulgiba, Awang [4 ]
机构
[1] Univ Teknol MARA Malaysia, Fac Comp & Math Sci, Shah Alam 40450, Malaysia
[2] Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA
[3] Julius Ctr Univ Malaya, Kuala Lumpur, Malaysia
[4] Univ Malaya, Fac Med, Dept Social & Prevent Med, Kuala Lumpur 50603, Malaysia
关键词
data mining; classification; imbalanced data; SVM; k-NN;
D O I
10.1063/1.4954536
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Data mining classification methods are affected when the data is imbalanced, that is, when one class is larger than the other class in size for the case of a two-class dependent variable. Many new methods have been developed to handle imbalanced datasets. In handling a binary classification task, Support Vector Machine (SVM) is one of the methods reported to give a high accuracy in predictive modeling compared to the other techniques such as Logistic Regression and Discriminant Analysis. The strength of SVM is the robustness of its algorithm and the capability to integrate with kernel-based learning that results in a more flexible analysis and optimized solution. Another popular method to handle imbalanced data is the random sampling method, such as random undersampling, random oversampling and synthetic sampling. The application of the Nearest Neighbours techniques in sampling approach has been seen as having a bigger advantage compared to other methods, as it can handle both structured and non-structured data. There are some studies that implement an ensemble method of both SVM and Nearest Neighbours with good results. This paper discusses the various methods in handling imbalanced data and an illustration of using SVM and k-Nearest Neighbours (k-NN) on a real-data set.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Fuzzy k-NN SVM
    Cheng, Hui-Chuan
    Yang, Chan-Yun
    Jan, Gene Eu
    Chen, Angela Shin-yih
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 1227 - 1232
  • [2] An Enhanced Approach on Handling Missing Values Using Bagging k-NN Imputation
    Kumutha, V.
    Palaniammal, S.
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS, 2013,
  • [3] An Optimized k-NN Approach for Classification on Imbalanced Datasets with Missing Data
    Ozan, Ezgi Can
    Riabchenko, Ekaterina
    Kiranyaz, Serkan
    Gabbouj, Moncef
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS XV, 2016, 9897 : 387 - 392
  • [4] Automatic Classifier for Skin Disease Using k-NN and SVM
    Nosseir, Ann
    Shawky, Mokhtar Ahmed
    [J]. PROCEEDINGS OF 2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND INFORMATION ENGINEERING (ICSIE 2019), 2019, : 259 - 262
  • [5] Medical Dataset Classification Using k-NN and Genetic Algorithm
    Kumar, Santosh
    Sahoo, G.
    [J]. COMPUTATIONAL INTELLIGENCE IN DATA MINING, CIDM 2016, 2017, 556 : 813 - 823
  • [6] Distance Weighted Fuzzy k-NN SVM
    Cheng, Yi-Wen
    Wen, Te-Jen
    Cheng, Hui-Chuan
    Yang, Chan-Yun
    [J]. 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING, AND CONTROL (ICNSC), 2016,
  • [7] Distance Weighted Fuzzy k-NN SVM
    Cheng, Yi-Wen
    Wen, Te-Jen
    Cheng, Hui-Chuan
    Yang, Chan-Yun
    [J]. 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING, AND CONTROL (ICNSC), 2016,
  • [8] Classification of Targets in SAR Images Using SVM and k-NN Techniques
    Demirhan, Mahmut Esat
    Salor, Ozgul
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1581 - 1584
  • [9] Partial Discharge Localization through k-NN and SVM
    Sekatane, Permit Mathuhu
    Bokoro, Pitshou
    [J]. ENERGIES, 2023, 16 (21)
  • [10] Incremental k-NN SVM Method in Intrusion Detection
    Xu, Binhan
    Chen, Shuyu
    Zhang, Hancui
    Wu, Tianshu
    [J]. PROCEEDINGS OF 2017 8TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2017), 2017, : 712 - 717