Boosting support vector machines for imbalanced data sets

被引:0
|
作者
Wang, Benjamin X. [1 ]
Japkowicz, Nathalie [1 ]
机构
[1] Univ Ottawa, Sch Informat Technol & Engn, Ottawa, ON K1N 6N5, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real world data mining applications must address the issue of learning from imbalanced data sets. The problem occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed vector spaces or lack of information. Common approaches for dealing with the class imbalance problem involve modifying the data distribution or modifying the classifier. In this work, we choose to use a combination of both approaches. We use support vector machines with soft margins as the base classifier to solve the skewed vector spaces problem. Then we use a boosting algorithm to get an ensemble classifier that has lower error than a single classifier. We found that this ensemble of SVMs makes an impressive improvement in prediction performance, not only for the majority class, but also for the minority class.
引用
收藏
页码:38 / 47
页数:10
相关论文
共 50 条
  • [31] Boosting support vector machines using multiple dissimilarities
    Blanco, Angela
    Martin-Merino, Manuel
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS: KES 2007 - WIRN 2007, PT I, PROCEEDINGS, 2007, 4692 : 140 - +
  • [32] Support vector machines, kernel logistic regression and boosting
    Zhu, J
    Hastie, R
    MULTIPLE CLASSIFIER SYSTEMS, 2002, 2364 : 16 - 26
  • [33] Support vector machines for credit risk assessment with imbalanced datasets
    Khemakhem, Sihem
    Boujelbene, Younes
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2018, 10 (02) : 171 - 187
  • [34] Classifying Data Sets Using Support Vector Machines Based on Geometric Distance
    王红梅
    赵政
    郑建华
    Transactions of Tianjin University, 2006, (02) : 153 - 156
  • [35] Classifying data sets using posterior probability for multiclass support vector machines
    Wang, Hongmei
    Zeng, Yuan
    Zhao, Zheng
    Wang, Chengshan
    Journal of Computational Information Systems, 2008, 4 (02): : 541 - 546
  • [36] Using support vector machines for mining regression classes in large data sets
    Sun, ZH
    Gao, LX
    Sun, YX
    2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 89 - 92
  • [37] Parameters optimization of support vector machines for imbalanced data using social ski driver algorithm
    Tharwat, Alaa
    Gabel, Thomas
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 6925 - 6938
  • [38] Parameters optimization of support vector machines for imbalanced data using social ski driver algorithm
    Alaa Tharwat
    Thomas Gabel
    Neural Computing and Applications, 2020, 32 : 6925 - 6938
  • [39] Imbalanced data classification using second-order cone programming support vector machines
    Maldonado, Sebastian
    Lopez, Julio
    PATTERN RECOGNITION, 2014, 47 (05) : 2070 - 2079
  • [40] CLASSIFICATION OF HYPERSPECTRAL REMOTE SENSING IMAGES BY AN ENSEMBLE OF SUPPORT VECTOR MACHINES UNDER IMBALANCED DATA
    Eeti, Laxmi Narayana
    Buddhiraju, Krishna Mohan
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 2659 - 2661