MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning

被引:766
|
作者
Barua, Sukarna [1 ]
Islam, Md. Monirul [1 ]
Yao, Xin [2 ]
Murase, Kazuyuki [3 ]
机构
[1] Bangladesh Univ Engn & Technol, Dept Comp Sci & Engn, Dhaka 1000, Bangladesh
[2] Univ Birmingham, Nat Computat Grp, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
[3] Univ Fukui, Dept Human & Artificial Intelligence Syst, Fukui 9108507, Japan
关键词
Imbalanced learning; undersampling; oversampling; synthetic sample generation; clustering; ALGORITHMS; SMOTE;
D O I
10.1109/TKDE.2012.232
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced learning problems contain an unequal distribution of data samples among different classes and pose a challenge to any classifier as it becomes hard to learn the minority class samples. Synthetic oversampling methods address this problem by generating the synthetic minority class samples to balance the distribution between the samples of the majority and minority classes. This paper identifies that most of the existing oversampling methods may generate the wrong synthetic minority samples in some scenarios and make learning tasks harder. To this end, a new method, called Majority Weighted Minority Oversampling TEchnique (MWMOTE), is presented for efficiently handling imbalanced learning problems. MWMOTE first identifies the hard-to-learn informative minority class samples and assigns them weights according to their euclidean distance from the nearest majority class samples. It then generates the synthetic samples from the weighted informative minority class samples using a clustering approach. This is done in such a way that all the generated samples lie inside some minority class cluster. MWMOTE has been evaluated extensively on four artificial and 20 real-world data sets. The simulation results show that our method is better than or comparable with some other existing methods in terms of various assessment metrics, such as geometric mean (G-mean) and area under the receiver operating curve (ROC), usually known as area under curve (AUC).
引用
收藏
页码:405 / 425
页数:21
相关论文
共 50 条
  • [1] NI-MWMOTE: An improving noise-immunity majority weighted minority oversampling technique for imbalanced classification problems
    Wei, Jianan
    Huang, Haisong
    Yao, Liguo
    Hu, Yao
    Fan, Qingsong
    Huang, Dong
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 158
  • [2] A Novel Synthetic Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Murase, Kazuyuki
    NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 735 - +
  • [3] An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem
    Wang, Chao-Ran
    Shao, Xin-Hui
    IEEE ACCESS, 2021, 9 : 5069 - 5082
  • [4] SPAW-SMOTE: Space Partitioning Adaptive Weighted Synthetic Minority Oversampling Technique For Imbalanced Data Set Learning
    Zhang, Qiang
    He, Junjiang
    Li, Tao
    Lan, Xiaolong
    Fang, Wenbo
    Li, Yihong
    COMPUTER JOURNAL, 2023, 67 (05): : 1747 - 1762
  • [5] Global Data Distribution Weighted Synthetic Oversampling Technique for Imbalanced Learning
    Wang, Zhenfei
    Wang, Hongju
    IEEE ACCESS, 2021, 9 : 44770 - 44783
  • [6] WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning
    Zhang, Wenhao
    Ramezani, Ramin
    Naeim, Arash
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2523 - 2531
  • [7] An improved and random synthetic minority oversampling technique for imbalanced data
    Wei, Guoliang
    Mu, Weimeng
    Song, Yan
    Dou, Jun
    KNOWLEDGE-BASED SYSTEMS, 2022, 248
  • [8] Multiple Kernel Learning With Minority Oversampling for Classifying Imbalanced Data
    Wang, Ling
    Wang, Hongqiao
    Fu, Guangyuan
    IEEE ACCESS, 2021, 9 : 565 - 580
  • [9] Learning class-imbalanced data with region-impurity synthetic minority oversampling technique
    Li, Der -Chiang
    Wang, Ssu-Yang
    Huang, Kuan-Cheng
    Tsai, Tung -, I
    INFORMATION SCIENCES, 2022, 607 : 1391 - 1407
  • [10] Minority-prediction-probability-based oversampling technique for imbalanced learning
    Wei, Zhen
    Zhang, Li
    Zhao, Lei
    INFORMATION SCIENCES, 2023, 622 : 1273 - 1295