On oversampling imbalanced data with deep conditional generative models

被引:23
|
作者
Fajardo, Val Andrei [1 ]
Findlay, David [1 ]
Jaiswal, Charu [1 ]
Yin, Xinshang [1 ]
Houmanfar, Roshanak [1 ]
Xie, Honglei [1 ]
Liang, Jiaxi [1 ]
She, Xichen [1 ]
Emerson, D. B. [1 ]
机构
[1] Integrate Ai, 480 Univ Ave, Toronto, ON, Canada
关键词
Deep generative models; Conditional variational autoencoders; Class imbalance; Oversampling;
D O I
10.1016/j.eswa.2020.114463
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalanced datasets are common in real-world applications ranging from credit card fraud detection to rare disease diagnosis. Recently, deep generative models have proved successful for an array of machine learning problems such as semi-supervised learning, transfer learning, and recommender systems. However their application to class imbalance situations is limited. In this paper, we consider class conditional variants of generative adversarial networks and variational autoencoders and apply them to the imbalance problem. The main question we seek to answer is whether or not deep conditional generative models can effectively learn the distributions of minority classes so as to produce synthetic observations that ultimately lead to improvements in the performance of a downstream classifier. The numerical results show that this is indeed true and that deep generative models outperform traditional oversampling methods in many circumstances, especially in cases of severe imbalance.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Implications of data topology for deep generative models
    Jin, Yinzhu
    Mcdaniel, Rory
    Tatro, N. Joseph
    Catanzaro, Michael J.
    Smith, Abraham D.
    Bendich, Paul
    Dwyer, Matthew B.
    Fletcher, P. Thomas
    [J]. FRONTIERS IN COMPUTER SCIENCE, 2024, 6
  • [32] Learning Structured Output Representation using Deep Conditional Generative Models
    Sohn, Kihyuk
    Yan, Xinchen
    Lee, Honglak
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [33] Oversampling boosting for classification of imbalanced software defect data
    Li, Guangling
    Wang, Shihai
    [J]. PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 4149 - 4154
  • [34] Boosting imbalanced data learning with Wiener process oversampling
    Li, Qian
    Li, Gang
    Niu, Wenjia
    Cao, Yanan
    Chang, Liang
    Tan, Jianlong
    Guo, Li
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2017, 11 (05) : 836 - 851
  • [35] Oversampling for Imbalanced Data Classification Using Adversarial Network
    Lee, Sang-Kwang
    Hong, Seung-Jin
    Yang, Seong-Il
    [J]. 2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1255 - 1257
  • [36] Boosting imbalanced data learning with Wiener process oversampling
    Qian Li
    Gang Li
    Wenjia Niu
    Yanan Cao
    Liang Chang
    Jianlong Tan
    Li Guo
    [J]. Frontiers of Computer Science, 2017, 11 : 836 - 851
  • [37] Noise-robust oversampling for imbalanced data classification
    Liu, Yongxu
    Liu, Yan
    Yu, Bruce X. B.
    Zhong, Shenghua
    Hu, Zhejing
    [J]. PATTERN RECOGNITION, 2023, 133
  • [38] A deep multimodal generative and fusion framework for class-imbalanced multimodal data
    Li, Qing
    Yu, Guanyuan
    Wang, Jun
    Liu, Yuehao
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (33-34) : 25023 - 25050
  • [39] Machinery fault diagnosis with imbalanced data using deep generative adversarial networks
    Zhang, Wei
    Li, Xiang
    Jia, Xiao-Dong
    Ma, Hui
    Luo, Zhong
    Li, Xu
    [J]. MEASUREMENT, 2020, 152
  • [40] Oversampling Method for Imbalanced Data Using Credible Counterfactual
    Gao, Feng
    Song, Mei
    Zhu, Yi
    [J]. Computer Engineering and Applications, 2024, 60 (05) : 165 - 171