Conditional Data Synthesis with Deep Generative Models for Imbalanced Dataset Oversampling

被引:0
|
作者
Akritidis, Leonidas [1 ]
Fevgas, Athanasios [2 ]
Alamaniotis, Miltiadis [2 ]
Bozanis, Panayiotis [1 ]
机构
[1] Intl Hellen Univ, Sch Sci & Technol, Thessaloniki, Greece
[2] Univ Thessaly, Dept Elect & Comp Engn, Volos, Greece
关键词
imbalanced datasets; oversampling; generative models; GAN; VAE; SMOTE;
D O I
10.1109/ICTAI59109.2023.00071
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of data imbalance is defined as the uneven distribution of the training examples to the existing classes of a dataset. Among a wide variety of solutions, the oversampling techniques try to mitigate the problem by synthesizing artificial examples associated with the minority class. The huge success of Generative Adversarial Networks (GANs) rendered them an attractive choice for oversampling and numerous researchers proposed modifications of GANs for imbalanced datasets. Nevertheless, the existing models employ the entire minority class for sample generation, thus being vulnerable to outliers and noisy data instances. In addition, the majority of the relevant research concerns image classification tasks, leaving a large gap for research with tabular data. Finally, another powerful and popular generative model, the Variational Autoencoder (VAE) has been rather overlooked by the community in class imbalance solutions. In this paper we present SB-GAN and SB-VAE, two generative models that identify borderline and noisy samples before they are trained. In this manner SB-GAN and SB-VAE learn better class distributions that are not distorted by the existence of outliers. The experimental evaluation of SB-GAN and SB-VAE with 4 tabular datasets revealed a superior performance against 8 state-of-the-art oversampling techniques.
引用
收藏
页码:444 / 451
页数:8
相关论文
共 50 条
  • [1] On oversampling imbalanced data with deep conditional generative models
    Fajardo, Val Andrei
    Findlay, David
    Jaiswal, Charu
    Yin, Xinshang
    Houmanfar, Roshanak
    Xie, Honglei
    Liang, Jiaxi
    She, Xichen
    Emerson, D. B.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169 (169)
  • [2] Oversampling Highly Imbalanced Indoor Positioning Data using Deep Generative Models
    Alhomayani, Fahad
    Mahoor, Mohammad H.
    [J]. 2021 IEEE SENSORS, 2021,
  • [3] Binary imbalanced data classification based on diversity oversampling by generative models
    Zhai, Junhai
    Qi, Jiaxing
    Shen, Chu
    [J]. INFORMATION SCIENCES, 2022, 585 : 313 - 343
  • [4] Oversampling Tabular Data with Deep Generative Models: Is it worth the effort?
    Camino, Ramiro D.
    State, Radu
    Hammerschmidt, Christian A.
    [J]. NEURIPS WORKSHOPS, 2020, 2020, 137 : 148 - 157
  • [5] Generative Oversampling Methods for Handling Imbalanced Data in Software Fault Prediction
    Rathore, Santosh Singh
    Chouhan, Satyendra Singh
    Jain, Dixit Kumar
    Vachhani, Aakash Gopal
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (02) : 747 - 762
  • [6] Generative Oversampling Method for Imbalanced Data on Bearing Fault Detection and Diagnosis
    Suh, Sungho
    Lee, Haebom
    Jo, Jun
    Lukowicz, Paul
    Lee, Yong Oh
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (04):
  • [7] Generative Oversampling Method (GenOMe) for Imbalanced Data on Apnea Detection using ECG Data
    Sanabila, H. R.
    Kusuma, Ilham
    Jatmiko, Wisnu
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2016, : 572 - 577
  • [8] Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning
    Engelmann, Justin
    Lessmann, Stefan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 174
  • [9] Conditional Molecular Design with Deep Generative Models
    Kang, Seokho
    Cho, Kyunghyun
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (01) : 43 - 52
  • [10] Explainability of SMOTE Based Oversampling for Imbalanced Dataset Problems
    Patil, Aum
    Framewala, Aman
    Kazi, Faruk
    [J]. 2020 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT 2020), 2020, : 41 - 45