Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias

被引：28

作者：

Nunez, Haydemar ^{[1
]}

Gonzalez-Abril, Luis ^{[2
]}

Angulo, Cecilio ^{[3
]}

机构：

[1] Univ Cent Venezuela, Fac Ciencias, Escuela Comp, Paseo Ilustres Caracas 1040, Venezuela

[2] Univ Seville, Seville, Spain

[3] Tech Univ Catalonia, Barcelona, Spain

来源：

JOURNAL OF CLASSIFICATION | 2017年 / 34卷 / 03期

关键词：

Support Vector Machine; Post-processing; Bias; Cost-sensitive strategy: SMOTE; SUPPORT VECTOR MACHINES; SMOTE;

D O I：

10.1007/s00357-017-9242-x

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Support Vector Machine (SVM) learning from imbalanced datasets, as well as most learning machines, can show poor performance on the minority class because SVMs were designed to induce a model based on the overall error. To improve their performance in these kind of problems, a low-cost post-processing strategy is proposed based on calculating a new bias to adjust the function learned by the SVM. The proposed bias will consider the proportional size between classes in order to improve performance on the minority class. This solution avoids not only introducing and tuning new parameters, but also modifying the standard optimization problem for SVM training. Experimental results on 34 datasets, with different degrees of imbalance, show that the proposed method actually improves the classification on imbalanced datasets, by using standardized error measures based on sensitivity and g-means. Furthermore, its performance is comparable to well-known cost-sensitive and Synthetic Minority Over-sampling Technique (SMOTE) schemes, without adding complexity or computational costs.

引用

页码：427 / 443

页数：17

共 50 条

[21] μSVM - A new method for solving the problem of imbalanced dataset classification
Yang, Zhiming
Peng, Xiyuan
Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2008, 29 (SUPPL. 2): : 117 - 122
[22] Improving SVM classification on imbalanced time series data sets with ghost points
Suzan Köknar-Tezel
Longin Jan Latecki
Knowledge and Information Systems, 2011, 28 : 1 - 23
[23] Improving SVM classification on imbalanced time series data sets with ghost points
Koeknar-Tezel, Suzan
Latecki, Longin Jan
KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 28 (01) : 1 - 23
[24] Research on classification method of high-dimensional class-imbalanced datasets based on SVM
Chunkai Zhang
Ying Zhou
Jianwei Guo
Guoquan Wang
Xuan Wang
International Journal of Machine Learning and Cybernetics, 2019, 10 : 1765 - 1778
[25] Research on classification method of high-dimensional class-imbalanced datasets based on SVM
Zhang, Chunkai
Zhou, Ying
Guo, Jianwei
Wang, Guoquan
Wang, Xuan
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (07) : 1765 - 1778
[26] An Improved SVM-KM Model For Imbalanced Datasets
Deng Weiguo
Wang Li
Wang Yiyang
Qian Zhong
2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING (ICICEE), 2012, : 100 - 103
[27] Boosting prediction accuracy on imbalanced datasets with SVM ensembles
Liu, Yang
An, Aijun
Huang, Xiangji
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 107 - 118
[28] Classification of Antimicrobial Peptides with Imbalanced Datasets
Camacho, Francy L.
Torres, Rodrigo
Ramos Pollan, Raul
11TH INTERNATIONAL SYMPOSIUM ON MEDICAL INFORMATION PROCESSING AND ANALYSIS, 2015, 9681
[29] Discrimination Aware Classification for Imbalanced Datasets
Ristanoski, Goce
Liu, Wei
Bailey, James
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1529 - 1532
[30] Study on source of classification in imbalanced datasets based on new ensemble classifier
Zhai Y.
Yang B.-R.
Qu W.
Sui H.-F.
Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2011, 33 (01): : 196 - 201

← 1 2 3 4 5 →