Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias

被引:28
|
作者
Nunez, Haydemar [1 ]
Gonzalez-Abril, Luis [2 ]
Angulo, Cecilio [3 ]
机构
[1] Univ Cent Venezuela, Fac Ciencias, Escuela Comp, Paseo Ilustres Caracas 1040, Venezuela
[2] Univ Seville, Seville, Spain
[3] Tech Univ Catalonia, Barcelona, Spain
关键词
Support Vector Machine; Post-processing; Bias; Cost-sensitive strategy: SMOTE; SUPPORT VECTOR MACHINES; SMOTE;
D O I
10.1007/s00357-017-9242-x
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Support Vector Machine (SVM) learning from imbalanced datasets, as well as most learning machines, can show poor performance on the minority class because SVMs were designed to induce a model based on the overall error. To improve their performance in these kind of problems, a low-cost post-processing strategy is proposed based on calculating a new bias to adjust the function learned by the SVM. The proposed bias will consider the proportional size between classes in order to improve performance on the minority class. This solution avoids not only introducing and tuning new parameters, but also modifying the standard optimization problem for SVM training. Experimental results on 34 datasets, with different degrees of imbalance, show that the proposed method actually improves the classification on imbalanced datasets, by using standardized error measures based on sensitivity and g-means. Furthermore, its performance is comparable to well-known cost-sensitive and Synthetic Minority Over-sampling Technique (SMOTE) schemes, without adding complexity or computational costs.
引用
收藏
页码:427 / 443
页数:17
相关论文
共 50 条
  • [21] μSVM - A new method for solving the problem of imbalanced dataset classification
    Yang, Zhiming
    Peng, Xiyuan
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2008, 29 (SUPPL. 2): : 117 - 122
  • [22] Improving SVM classification on imbalanced time series data sets with ghost points
    Suzan Köknar-Tezel
    Longin Jan Latecki
    Knowledge and Information Systems, 2011, 28 : 1 - 23
  • [23] Improving SVM classification on imbalanced time series data sets with ghost points
    Koeknar-Tezel, Suzan
    Latecki, Longin Jan
    KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 28 (01) : 1 - 23
  • [24] Research on classification method of high-dimensional class-imbalanced datasets based on SVM
    Chunkai Zhang
    Ying Zhou
    Jianwei Guo
    Guoquan Wang
    Xuan Wang
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 1765 - 1778
  • [25] Research on classification method of high-dimensional class-imbalanced datasets based on SVM
    Zhang, Chunkai
    Zhou, Ying
    Guo, Jianwei
    Wang, Guoquan
    Wang, Xuan
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (07) : 1765 - 1778
  • [26] An Improved SVM-KM Model For Imbalanced Datasets
    Deng Weiguo
    Wang Li
    Wang Yiyang
    Qian Zhong
    2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING (ICICEE), 2012, : 100 - 103
  • [27] Boosting prediction accuracy on imbalanced datasets with SVM ensembles
    Liu, Yang
    An, Aijun
    Huang, Xiangji
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 107 - 118
  • [28] Classification of Antimicrobial Peptides with Imbalanced Datasets
    Camacho, Francy L.
    Torres, Rodrigo
    Ramos Pollan, Raul
    11TH INTERNATIONAL SYMPOSIUM ON MEDICAL INFORMATION PROCESSING AND ANALYSIS, 2015, 9681
  • [29] Discrimination Aware Classification for Imbalanced Datasets
    Ristanoski, Goce
    Liu, Wei
    Bailey, James
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1529 - 1532
  • [30] Study on source of classification in imbalanced datasets based on new ensemble classifier
    Zhai Y.
    Yang B.-R.
    Qu W.
    Sui H.-F.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2011, 33 (01): : 196 - 201