Distribution Enhancement for Imbalanced Data with Generative Adversarial Network

被引:0
|
作者
Chen, Yueqi [1 ,2 ]
Pedrycz, Witold [3 ,4 ,5 ]
Pan, Tingting [6 ]
Wang, Jian [7 ]
Yang, Jie [1 ,2 ]
机构
[1] Dalian Univ Technol, Sch Math Sci, Dalian 116024, Liaoning, Peoples R China
[2] Key Lab Computat Math & Data Intelligence Liaoning, Dalian 116024, Liaoning, Peoples R China
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G 2R3, Canada
[4] Polish Acad Sci, Syst Res Inst, PL-00901 Warsaw, Mazowieckie, Poland
[5] Istinye Univ, Fac Engn & Nat Sci, Dept Comp Engn, TR-34010 Istanbul, Turkiye
[6] Dalian Polytech Univ, Dept Basic Courses Teaching, Dalian 116024, Liaoning, Peoples R China
[7] China Univ Petr East China, Coll Sci, Qingdao 266580, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
GAN (Generative Adversarial Network); imbalanced learning; mode collapse; oversampling;
D O I
10.1002/adts.202400234
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Tackling imbalanced problems encountered in real-world applications poses a challenge at present. Oversampling is a widely useful method for imbalanced tabular data. However, most traditional oversampling methods generate samples by interpolation of minority (positive) class, failing to entirely capture the probability density distribution of the original data. In this paper, a novel oversampling method is presented based on generative adversarial network (GAN) with the originality of introducing three strategies to enhance the distribution of the positive class, called GAN-E. The first strategy is to inject prior knowledge of positive class into the latent space of GAN, improving sample emulation. The second strategy is to inject random noise containing this prior knowledge into both original and generated positive samples to stretch the learning space of the discriminator of GAN. The third one is to use multiple GANs to learn comprehensive probability distributions of positive class based on multi-scale data to eliminate the influence of GAN on generating aggregate samples. The experimental results and statistical tests obtained on 18 commonly used imbalanced datasets show that the proposed method comes with a better performance in terms of G-mean, F-measure, AUC and accuracy than 14 other rebalanced methods. This paper introduces three strategies to improve the ability of GAN to handle imbalanced data. The first strategy is to inject prior knowledge into the latent space of GAN. The second strategy is to inject random noise into the discriminator. The third one is to use multiple GANs to learn comprehensive probability distributions of positive class based on multi-scale data. image
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Local Tangent Generative Adversarial Network for Imbalanced Data Classification
    Li, Zhihao
    Yu, Zhiwen
    Yang, Kaixiang
    Shi, Yifan
    Xu, Yuhong
    Chen, C. L. Philip
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [2] Addressing Imbalanced Data Problem with Generative Adversarial Network For Intrusion Detection
    Yilmaz, Ibrahim
    Masum, Rahat
    Siraj, Ambareen
    [J]. 2020 IEEE 21ST INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2020), 2020, : 25 - 30
  • [3] Imbalanced spectral data analysis using data augmentation based on the generative adversarial network
    Chung, Jihoon
    Zhang, Junru
    Saimon, Amirul Islam
    Liu, Yang
    Johnson, Blake N.
    Kong, Zhenyu
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [4] A Wasserstein Generative Adversarial Network-Gradient Penalty-Based Model with Imbalanced Data Enhancement for Network Intrusion Detection
    Lee, Gwo-Chuan
    Li, Jyun-Hong
    Li, Zi-Yang
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (14):
  • [5] A novel generative adversarial network for improving crash severity modeling with imbalanced data
    Chen, Junlan
    Pu, Ziyuan
    Zheng, Nan
    Wen, Xiao
    Ding, Hongliang
    Guo, Xiucheng
    [J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2024, 164
  • [6] A dynamic spectrum loss generative adversarial network for intelligent fault with imbalanced data
    Wang, Xin
    Jiang, Hongkai
    Liu, Yunpeng
    Liu, Shaowei
    Yang, Qiao
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [7] Fault Diagnosis of Harmonic Drive With Imbalanced Data Using Generative Adversarial Network
    Yang, Guo
    Zhong, Yong
    Yang, Lie
    Tao, Hui
    Li, Jianying
    Du, Ruxu
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70
  • [8] Jujube quality grading using a generative adversarial network with an imbalanced data set
    Cang, Hao
    Yan, Tianying
    Duan, Long
    Yan, Jingkun
    Zhang, Yuan
    Tan, Fei
    Lv, Xin
    Gao, Pan
    [J]. BIOSYSTEMS ENGINEERING, 2023, 236 : 224 - 237
  • [9] An intra-class distribution-focused generative adversarial network approach for imbalanced tabular data learning
    Chen, Qiuling
    Ye, Ayong
    Zhang, Yuexin
    Chen, Jianwei
    Huang, Chuan
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (07) : 2551 - 2572
  • [10] An improved generative adversarial network to oversample imbalanced datasets
    Pan, Tingting
    Pedrycz, Witold
    Yang, Jie
    Wang, Jian
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 132