Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias

被引:28
|
作者
Nunez, Haydemar [1 ]
Gonzalez-Abril, Luis [2 ]
Angulo, Cecilio [3 ]
机构
[1] Univ Cent Venezuela, Fac Ciencias, Escuela Comp, Paseo Ilustres Caracas 1040, Venezuela
[2] Univ Seville, Seville, Spain
[3] Tech Univ Catalonia, Barcelona, Spain
关键词
Support Vector Machine; Post-processing; Bias; Cost-sensitive strategy: SMOTE; SUPPORT VECTOR MACHINES; SMOTE;
D O I
10.1007/s00357-017-9242-x
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Support Vector Machine (SVM) learning from imbalanced datasets, as well as most learning machines, can show poor performance on the minority class because SVMs were designed to induce a model based on the overall error. To improve their performance in these kind of problems, a low-cost post-processing strategy is proposed based on calculating a new bias to adjust the function learned by the SVM. The proposed bias will consider the proportional size between classes in order to improve performance on the minority class. This solution avoids not only introducing and tuning new parameters, but also modifying the standard optimization problem for SVM training. Experimental results on 34 datasets, with different degrees of imbalance, show that the proposed method actually improves the classification on imbalanced datasets, by using standardized error measures based on sensitivity and g-means. Furthermore, its performance is comparable to well-known cost-sensitive and Synthetic Minority Over-sampling Technique (SMOTE) schemes, without adding complexity or computational costs.
引用
收藏
页码:427 / 443
页数:17
相关论文
共 50 条
  • [31] A New Loss Function for Traffic Classification Task on Dramatic Imbalanced Datasets
    Xu, Luyang
    Zhou, Xu
    Lin, Xifeng
    Ren, Yongmao
    Qin, Yifang
    Liu, Jun
    ICC 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2020,
  • [32] Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification
    Maldonado, Sebastian
    Lopez, Julio
    APPLIED SOFT COMPUTING, 2018, 67 : 94 - 105
  • [33] INTRODUCING THREE NEW BENCHMARK DATASETS FOR HIERARCHICAL TEXT CLASSIFICATION
    du Toit, Jaco
    Redelinghuys, Herman
    Dunaiski, Marcel
    arXiv,
  • [34] Z-SVM: An SVM for improved classification of imbalanced data
    Imam, Tasadduq
    Ting, Kai Ming
    Kamruzzaman, Joarder
    AI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4304 : 264 - +
  • [35] Combination Approach of SMOTE and Biased-SVM for Imbalanced Datasets
    Wang He-Yong
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 228 - 231
  • [36] A robust loss function for classification with imbalanced datasets
    Wang, Yidan
    Yang, Liming
    NEUROCOMPUTING, 2019, 331 : 40 - 49
  • [37] Imbalanced classification in sparse and large behaviour datasets
    Jellis Vanhoeyveld
    David Martens
    Data Mining and Knowledge Discovery, 2018, 32 : 25 - 82
  • [38] FLSOM with Different Rates for Classification in Imbalanced Datasets
    Machon-Gonzalez, Ivan
    Lopez-Garcia, Hilario
    ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 642 - 651
  • [39] Imbalanced classification in sparse and large behaviour datasets
    Vanhoeyveld, Jellis
    Martens, David
    DATA MINING AND KNOWLEDGE DISCOVERY, 2018, 32 (01) : 25 - 82
  • [40] A-SMOTE: A New Preprocessing Approach for Highly Imbalanced Datasets by Improving SMOTE
    Ahmed Saad Hussein
    Tianrui Li
    Chubato Wondaferaw Yohannese
    Kamal Bashir
    International Journal of Computational Intelligence Systems, 2019, 12 : 1412 - 1422