Boosting support vector machines for imbalanced data sets

被引:0
|
作者
Wang, Benjamin X. [1 ]
Japkowicz, Nathalie [1 ]
机构
[1] Univ Ottawa, Sch Informat Technol & Engn, Ottawa, ON K1N 6N5, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real world data mining applications must address the issue of learning from imbalanced data sets. The problem occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed vector spaces or lack of information. Common approaches for dealing with the class imbalance problem involve modifying the data distribution or modifying the classifier. In this work, we choose to use a combination of both approaches. We use support vector machines with soft margins as the base classifier to solve the skewed vector spaces problem. Then we use a boosting algorithm to get an ensemble classifier that has lower error than a single classifier. We found that this ensemble of SVMs makes an impressive improvement in prediction performance, not only for the majority class, but also for the minority class.
引用
收藏
页码:38 / 47
页数:10
相关论文
共 50 条
  • [21] Boosting of Support Vector Machines with application to editing
    Rangel, P
    Lozano, F
    García, E
    ICMLA 2005: FOURTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2005, : 374 - 379
  • [22] Boosting and support vector machines as optimal separators
    Rosset, S
    Zhu, J
    Hastie, T
    DOCUMENT RECOGNITION AND RETRIEVAL X, 2003, 5010 : 1 - 7
  • [23] Solving imbalanced classification problems with support vector machines
    Lessmann, S
    IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS, 2004, : 214 - 220
  • [24] Biological Data Classification Using Rough Sets and Support Vector Machines
    Zhao, Yanjun
    Zhang, Yanqing
    Xiong, Naixue
    2009 ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY, 2009, : 344 - 349
  • [25] Support Vector Machines on Large Data Sets: Simple Parallel Approaches
    Meyer, Oliver
    Bischl, Bernd
    Weihs, Claus
    DATA ANALYSIS, MACHINE LEARNING AND KNOWLEDGE DISCOVERY, 2014, : 87 - 95
  • [26] Using the Leader Algorithm with Support Vector Machines for Large Data Sets
    Romero, Enrique
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT I, 2011, 6791 : 225 - 232
  • [27] A novel twin-support vector machines method for binary classification to imbalanced data
    Li, Jingyi
    Chao, Shiwei
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (04) : 6901 - 6910
  • [28] Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification
    Lee, Wonji
    Jun, Chi-Hyuck
    Lee, Jong-Seok
    INFORMATION SCIENCES, 2017, 381 : 92 - 103
  • [29] Learning imbalanced data sets with a min-max modular support vector machine
    Ye, Zhi-Fei
    Lu, Bao-Liang
    2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 1673 - 1678
  • [30] Boosting support vector machines for cancer discrimination tasks
    Turki, Turki
    Wei, Zhi
    COMPUTERS IN BIOLOGY AND MEDICINE, 2018, 101 : 236 - 249