On oversampling imbalanced data with deep conditional generative models

被引:23
|
作者
Fajardo, Val Andrei [1 ]
Findlay, David [1 ]
Jaiswal, Charu [1 ]
Yin, Xinshang [1 ]
Houmanfar, Roshanak [1 ]
Xie, Honglei [1 ]
Liang, Jiaxi [1 ]
She, Xichen [1 ]
Emerson, D. B. [1 ]
机构
[1] Integrate Ai, 480 Univ Ave, Toronto, ON, Canada
关键词
Deep generative models; Conditional variational autoencoders; Class imbalance; Oversampling;
D O I
10.1016/j.eswa.2020.114463
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalanced datasets are common in real-world applications ranging from credit card fraud detection to rare disease diagnosis. Recently, deep generative models have proved successful for an array of machine learning problems such as semi-supervised learning, transfer learning, and recommender systems. However their application to class imbalance situations is limited. In this paper, we consider class conditional variants of generative adversarial networks and variational autoencoders and apply them to the imbalance problem. The main question we seek to answer is whether or not deep conditional generative models can effectively learn the distributions of minority classes so as to produce synthetic observations that ultimately lead to improvements in the performance of a downstream classifier. The numerical results show that this is indeed true and that deep generative models outperform traditional oversampling methods in many circumstances, especially in cases of severe imbalance.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Conditional Data Synthesis with Deep Generative Models for Imbalanced Dataset Oversampling
    Akritidis, Leonidas
    Fevgas, Athanasios
    Alamaniotis, Miltiadis
    Bozanis, Panayiotis
    [J]. 2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 444 - 451
  • [2] Oversampling Highly Imbalanced Indoor Positioning Data using Deep Generative Models
    Alhomayani, Fahad
    Mahoor, Mohammad H.
    [J]. 2021 IEEE SENSORS, 2021,
  • [3] Binary imbalanced data classification based on diversity oversampling by generative models
    Zhai, Junhai
    Qi, Jiaxing
    Shen, Chu
    [J]. INFORMATION SCIENCES, 2022, 585 : 313 - 343
  • [4] Oversampling Tabular Data with Deep Generative Models: Is it worth the effort?
    Camino, Ramiro D.
    State, Radu
    Hammerschmidt, Christian A.
    [J]. NEURIPS WORKSHOPS, 2020, 2020, 137 : 148 - 157
  • [5] Generative Oversampling Methods for Handling Imbalanced Data in Software Fault Prediction
    Rathore, Santosh Singh
    Chouhan, Satyendra Singh
    Jain, Dixit Kumar
    Vachhani, Aakash Gopal
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (02) : 747 - 762
  • [6] Generative Oversampling Method for Imbalanced Data on Bearing Fault Detection and Diagnosis
    Suh, Sungho
    Lee, Haebom
    Jo, Jun
    Lukowicz, Paul
    Lee, Yong Oh
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (04):
  • [7] Generative Oversampling Method (GenOMe) for Imbalanced Data on Apnea Detection using ECG Data
    Sanabila, H. R.
    Kusuma, Ilham
    Jatmiko, Wisnu
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2016, : 572 - 577
  • [8] Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning
    Engelmann, Justin
    Lessmann, Stefan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 174
  • [9] Conditional Molecular Design with Deep Generative Models
    Kang, Seokho
    Cho, Kyunghyun
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (01) : 43 - 52
  • [10] Data Redaction from Conditional Generative Models
    Kong, Zhifeng
    Chaudhuri, Kamalika
    [J]. IEEE CONFERENCE ON SAFE AND TRUSTWORTHY MACHINE LEARNING, SATML 2024, 2024, : 569 - 591