An improved generative adversarial network to oversample imbalanced datasets

被引:2
|
作者
Pan, Tingting [1 ]
Pedrycz, Witold [2 ]
Yang, Jie [3 ,4 ]
Wang, Jian [5 ]
机构
[1] Dalian Polytech Univ, Dept Basic Courses Teaching, Dalian 116034, Peoples R China
[2] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G 2G7, Canada
[3] Dalian Univ Technol, Sch Math Sci, Dalian 116024, Peoples R China
[4] Key Lab Computat Math & Data Intelligence Liaoning, Dalian 116024, Peoples R China
[5] China Univ Petr East China, Coll Sci, Qingdao 266580, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Imbalanced learning; Generative adversarial network (GAN); Oversampling; Probability distribution; CLASSIFICATION; CLASSIFIERS; SMOTE; GAN;
D O I
10.1016/j.engappai.2024.107934
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many oversampling methods applied to imbalanced data generate samples according to local density distribution of minority samples. However, samples generated by these methods can only present a non -deterministic relationship between the local and global distributions. A generative adversarial network (GAN) is a suitable tool to learn an unknown global probability distribution. In this paper, we propose an improved GAN (I-GAN) to oversample according to the global underlying structure of minority samples. The originality of I-GAN stems from the fact it provides additional density distribution information of minority samples for GAN and generated samples. By building on this idea, three detailed strategies are presented: input random vectors of the generator are sampled from a rough estimate of the distribution of minority samples to orientate fake samples more believable; a residual about minority samples is added into the discriminator to strengthen the constraint of loss function; generated samples are redistributed with a reshaper. These three strategies provide innovative methodologies at various stages of GANs for the oversampling task. Compared with 22 classical and popular imbalanced sampling methods under metrics of Gm, F1, and AUC on 24 benchmark imbalanced datasets, it is shown that I-GAN is effective and robust. The I-GAN implementation line procedure has been uploaded to Github (https://github.com/flowerbloom000/I-GAN).
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Data Augmentation Generated by Generative Adversarial Network for Small Sample Datasets Clustering
    Hui Yu
    Qiao Feng Wang
    Jian Yu Shi
    [J]. Neural Processing Letters, 2023, 55 : 8365 - 8384
  • [32] Data Augmentation Generated by Generative Adversarial Network for Small Sample Datasets Clustering
    Yu, Hui
    Wang, Qiao Feng
    Shi, Jian Yu
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (06) : 8365 - 8384
  • [33] Fairness GAN: Generating datasets with fairness properties using a generative adversarial network
    Sattigeri, P.
    Hoffman, S. C.
    Chenthamarakshan, V
    Varshney, K. R.
    [J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2019, 63 (4-5)
  • [34] Data Augmentation for Imbalanced HRRP Recognition Using Deep Convolutional Generative Adversarial Network
    Song, Yiheng
    Li, Yang
    Wang, Yanhua
    Hu, Cheng
    [J]. IEEE ACCESS, 2020, 8 : 201686 - 201695
  • [35] A Multi-Index Generative Adversarial Network for Tool Wear Detection with Imbalanced Data
    Zhang, Guokai
    Xiao, Haoping
    Jiang, Jingwen
    Liu, Qinyuan
    Liu, Yimo
    Wang, Liying
    [J]. COMPLEXITY, 2020, 2020
  • [36] Imbalanced spectral data analysis using data augmentation based on the generative adversarial network
    Chung, Jihoon
    Zhang, Junru
    Saimon, Amirul Islam
    Liu, Yang
    Johnson, Blake N.
    Kong, Zhenyu
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [37] Imbalanced Fault Diagnosis of Rolling Bearing Based on Generative Adversarial Network: A Comparative Study
    Mao, Wentao
    Liu, Yamin
    Ding, Ling
    Li, Yuan
    [J]. IEEE ACCESS, 2019, 7 : 9515 - 9530
  • [38] SYNTHETIC MINORITY CLASS DATA BY GENERATIVE ADVERSARIAL NETWORK FOR IMBALANCED SAR TARGET RECOGNITION
    Luo, Zhongming
    Jiang, Xue
    Liu, Xingzhao
    [J]. IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 2459 - 2462
  • [39] An ensemble oversampling method for imbalanced classification with prior knowledge via generative adversarial network
    Zhang, Yulin
    Liu, Yuchen
    Wang, Yan
    Yang, Jie
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2023, 235
  • [40] Ensembled Deep Convolutional Generative Adversarial Network for Grading Imbalanced Diabetic Retinopathy Recognition
    Naz, Huma
    Nijhawan, Rahul
    Ahuja, Neelu Jyothi
    Al-Otaibi, Shaha
    Saba, Tanzila
    Bahaj, Saeed Ali
    Rehman, Amjad
    [J]. IEEE ACCESS, 2023, 11 : 120554 - 120568