An intra-class distribution-focused generative adversarial network approach for imbalanced tabular data learning

被引:1
|
作者
Chen, Qiuling [1 ,2 ]
Ye, Ayong [1 ,2 ]
Zhang, Yuexin [1 ,2 ]
Chen, Jianwei [1 ,2 ]
Huang, Chuan [1 ,2 ]
机构
[1] Fujian Normal Univ, Coll Comp & Cyber Secur, Fuzhou 350007, Peoples R China
[2] Fujian Prov Key Lab Network Secur & Cryptol, Fuzhou 350007, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data learning; Oversampling; Clustering; Generative adversarial network; HYBRID APPROACH; SMOTE;
D O I
10.1007/s13042-023-02048-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data imbalance is a critical factor that adversely affects the performance of machine learning algorithms. It leads to deviations in decision boundaries, resulting in biased predictions towards the majority class and inaccurate classification of the minority class. Although oversampling the minority class using deep generative models is a popular strategy, many existing methods focus solely on enhancing data for the minority class while overlooking the distribution relationship within and between classes. Therefore, we propose an oversampling method that merges unsupervised clustering and generative adversarial network (GAN) to facilitate the imbalanced tabular data learning. First, we perform preprocessing (clustering) on the original data, remove clusters that do not require sampling and generate more samples for sparsely distributed minority class clusters to achieve sample balance within the minority class. Moreover, we design a CTGAN-based auxiliary classifier GAN (ACCTGAN) to generate the minority class. It enhances the semantic integrity of the synthetic data and avoids generating noisy samples. We conducted validation experiments comparing our approach to 7 typical methods on 12 real tabular datasets. Our method shows excellent performance in F1-measure and area under the curve (AUC), obtaining 19 and 20 best results on the three classifiers, respectively. It significantly enhances classification results and demonstrates good robustness and stability.
引用
收藏
页码:2551 / 2572
页数:22
相关论文
共 50 条
  • [1] Data Augmentation for Intra-class Imbalance with Generative Adversarial Network
    Hase, Natsuki
    Ito, Seiya
    Kaneko, Naoshi
    Sumi, Kazuhiko
    [J]. FOURTEENTH INTERNATIONAL CONFERENCE ON QUALITY CONTROL BY ARTIFICIAL VISION, 2019, 11172
  • [2] Distribution Enhancement for Imbalanced Data with Generative Adversarial Network
    Chen, Yueqi
    Pedrycz, Witold
    Pan, Tingting
    Wang, Jian
    Yang, Jie
    [J]. ADVANCED THEORY AND SIMULATIONS, 2024,
  • [3] Addressing the class imbalance in tabular datasets from a generative adversarial network approach in supervised machine learning
    Sanchez-Gutierrez, Maximo E.
    Gonzalez-Perez, Pedro P.
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [4] SYNTHETIC MINORITY CLASS DATA BY GENERATIVE ADVERSARIAL NETWORK FOR IMBALANCED SAR TARGET RECOGNITION
    Luo, Zhongming
    Jiang, Xue
    Liu, Xingzhao
    [J]. IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 2459 - 2462
  • [5] Synthetic augmentation for semantic segmentation of class imbalanced biomedical images: A data pair generative adversarial network approach
    Chai, Lu
    Wang, Zidong
    Chen, Jianqing
    Zhang, Guokai
    Alsaadi, Fawaz E.
    Alsaadi, Fuad E.
    Liu, Qinyuan
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 150
  • [6] Local Tangent Generative Adversarial Network for Imbalanced Data Classification
    Li, Zhihao
    Yu, Zhiwen
    Yang, Kaixiang
    Shi, Yifan
    Xu, Yuhong
    Chen, C. L. Philip
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [7] CasTGAN: Cascaded Generative Adversarial Network for Realistic Tabular Data Synthesis
    Alshantti, Abdallah
    Varagnolo, Damiano
    Rasheed, Adil
    Rahmati, Aria
    Westad, Frank
    [J]. IEEE ACCESS, 2024, 12 : 13213 - 13232
  • [8] An Imbalanced Generative Adversarial Network-Based Approach for Network Intrusion Detection in an Imbalanced Dataset
    Rao, Yamarthi Narasimha
    Babu, Kunda Suresh
    [J]. SENSORS, 2023, 23 (01)
  • [9] Data Augment in Imbalanced Learning Based on Generative Adversarial Networks
    Zhou, Zhuocheng
    Zhang, Bofeng
    Lv, Ying
    Shi, Tian
    Chang, Furong
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2019), PT IV, 2019, 1142 : 21 - 30
  • [10] Distribution Bias Aware Collaborative Generative Adversarial Network for Imbalanced Deep Learning in Industrial IoT
    Zhou, Xiaokang
    Hu, Yiyong
    Wu, Jiayi
    Liang, Wei
    Ma, Jianhua
    Jin, Qun
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (01) : 570 - 580