BCGAN: A CGAN-based over-sampling model using the boundary class for data balancing

被引:10
|
作者
Son, Minjae [1 ]
Jung, Seungwon [2 ]
Jung, Seungmin [2 ]
Hwang, Eenjun [2 ]
机构
[1] Kyowon, Seoul 04539, South Korea
[2] Korea Univ, Sch Elect Engn, Seoul 02841, South Korea
来源
JOURNAL OF SUPERCOMPUTING | 2021年 / 77卷 / 09期
基金
新加坡国家研究基金会;
关键词
Imbalanced data; Conditional generative adversarial network (CGAN); Borderline minority class; Over-sampling; MACHINE; SVM; CLASSIFICATION; PREDICTION; ACCURACY; SMOTE;
D O I
10.1007/s11227-021-03688-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A class imbalance problem occurs when a dataset is decomposed into one majority class and one minority class. This problem is critical in the machine learning domains because it induces bias in training machine learning models. One popular method to solve this problem is using a sampling technique to balance the class distribution by either under-sampling the majority class or over-sampling the minority class. So far, diverse over-sampling techniques have suffered from overfitting and noisy data generation problems. In this paper, we propose an over-sampling scheme based on the borderline class and conditional generative adversarial network (CGAN). More specifically, we define a borderline class based on the minority class data near the majority class. Then, we generate data for the borderline class using the CGAN for data balancing. To demonstrate the performance of the proposed scheme, we conducted various experiments on diverse imbalanced datasets. We report some of the results.
引用
收藏
页码:10463 / 10487
页数:25
相关论文
共 50 条
  • [1] BCGAN: A CGAN-based over-sampling model using the boundary class for data balancing
    Minjae Son
    Seungwon Jung
    Seungmin Jung
    Eenjun Hwang
    The Journal of Supercomputing, 2021, 77 : 10463 - 10487
  • [2] BCGAN-based Over-sampling Scheme for Imbalanced Data
    Son, Minjae
    Jung, Seungwon
    Moon, Jihoon
    Hwang, Eenjun
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 155 - 160
  • [3] METAbolomics data Balancing with Over-sampling Algorithms (META-BOA): an online resource for addressing class imbalance
    Hashimoto-Roth, Emily
    Surendra, Anuradha
    Lavallee-Adam, Mathieu
    Bennett, Steffany A. L.
    Cuperlovic-Culf, Miroslava
    BIOINFORMATICS, 2022, 38 (23) : 5326 - 5327
  • [4] Clustering boundary over-sampling classification method for imbalanced data sets
    Lou, Xiao-Jun
    Sun, Yu-Xuan
    Liu, Hai-Tao
    Liu, H.-T. (liuhaitao@wsn.cn), 1600, Zhejiang University (47): : 944 - 950
  • [5] Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE
    Chen, Junfeng
    Zheng, Zhongtuan
    Computer Engineering and Applications, 2024, 57 (23) : 106 - 112
  • [6] Denoise-Based Over-Sampling for Imbalanced Data Classification
    Dan, Wang
    Yian, Liu
    2020 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES 2020), 2020, : 275 - 278
  • [7] Imbalanced Data Over-Sampling Method Based on ISODATA Clustering
    Lv, Zhenzhe
    Liu, Qicheng
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (09) : 1528 - 1536
  • [8] Unbalanced data classification based on over-sampling and integrated learning
    Zhang, Yongjun
    Jian, Xiaowen
    2021 ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE (ACCTCS 2021), 2021, : 332 - 337
  • [9] Noise Reduction A Priori Synthetic Over-Sampling for class imbalanced data sets
    Rivera, William A.
    INFORMATION SCIENCES, 2017, 408 : 146 - 161
  • [10] Transfer synthetic over-sampling for class-imbalance learning with limited minority class data
    Xu-Ying Liu
    Sheng-Tao Wang
    Min-Ling Zhang
    Frontiers of Computer Science, 2019, 13 : 996 - 1009