Boosting support vector machines for imbalanced data sets

被引:0
|
作者
Wang, Benjamin X. [1 ]
Japkowicz, Nathalie [1 ]
机构
[1] Univ Ottawa, Sch Informat Technol & Engn, Ottawa, ON K1N 6N5, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real world data mining applications must address the issue of learning from imbalanced data sets. The problem occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed vector spaces or lack of information. Common approaches for dealing with the class imbalance problem involve modifying the data distribution or modifying the classifier. In this work, we choose to use a combination of both approaches. We use support vector machines with soft margins as the base classifier to solve the skewed vector spaces problem. Then we use a boosting algorithm to get an ensemble classifier that has lower error than a single classifier. We found that this ensemble of SVMs makes an impressive improvement in prediction performance, not only for the majority class, but also for the minority class.
引用
收藏
页码:38 / 47
页数:10
相关论文
共 50 条
  • [1] Boosting support vector machines for imbalanced data sets
    Wang, Benjamin X.
    Japkowicz, Nathalie
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (01) : 1 - 20
  • [2] Boosting support vector machines for imbalanced data sets
    Benjamin X. Wang
    Nathalie Japkowicz
    Knowledge and Information Systems, 2010, 25 : 1 - 20
  • [3] Boosting Support Vector Machines for Imbalanced Microarray Data
    Pratama, Risky Frasetio Wahyu
    Purnami, Santi Wulan
    Rahayu, Santi Puteri
    INNS CONFERENCE ON BIG DATA AND DEEP LEARNING, 2018, 144 : 174 - 183
  • [4] Intuitionistic fuzzy twin support vector machines for imbalanced data
    Rezvani, Salim
    Wang, Xizhao
    NEUROCOMPUTING, 2022, 507 : 16 - 25
  • [5] Locally Linear Support Vector Machines for Imbalanced Data Classification
    Krawczyk, Bartosz
    Cano, Alberto
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT I, 2021, 12712 : 616 - 628
  • [6] Krein twin support vector machines for imbalanced data classification
    Jimenez-Castano, C.
    Alvarez-Meza, A.
    Cardenas-Pena, D.
    Orozco-Gutierrez, A.
    Guerrero-Erazo, J.
    PATTERN RECOGNITION LETTERS, 2024, 182 : 39 - 45
  • [7] Classifying Remote Sensing Data with Support Vector Machines and Imbalanced Training Data
    Waske, Bjorn
    Benediktsson, Jon Atli
    Sveinsson, Johannes R.
    MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, 2009, 5519 : 375 - 384
  • [8] Parameter Tuning of Large Scale Support Vector Machines using Ensemble Learning with Applications to Imbalanced Data Sets
    Nakayama, Hirotaka
    Yun, Yeboon
    Uno, Yuki
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2815 - 2820
  • [9] Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines
    Maldonado, Sebastian
    Weber, Richard
    Famili, Fazel
    INFORMATION SCIENCES, 2014, 286 : 228 - 246
  • [10] Classification of Imbalanced Data by Oversampling in Kernel Space of Support Vector Machines
    Mathew, Josey
    Pang, Chee Khiang
    Luo, Ming
    Leong, Weng Hoe
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (09) : 4065 - 4076