Learning From Imbalanced Data With Deep Density Hybrid Sampling

被引:8
|
作者
Liu, Chien-Liang [1 ]
Chang, Yu-Hua [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Dept Ind Engn & Management, Hsinchu 30010, Taiwan
关键词
Boosting; Training; Euclidean distance; Sampling methods; Costs; Hybrid power systems; Estimation; Class imbalance; embedding network; hybrid sampling; imbalanced data; synthetic data; SMOTE; CLASSIFICATION;
D O I
10.1109/TSMC.2022.3151394
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning from imbalanced data is an important and challenging topic in machine learning. Many works have devised methods to cope with imbalanced data, but most methods only consider minority or majority classes without considering the relationship between the two classes. In addition, many synthetic minority oversampling technique-based methods generate synthetic samples from the original feature space and use the Euclidean distance to search for the nearest neighbors. However, the Euclidean distance is not a precise distance metric in a high-dimensional space. This article proposes a novel method, called deep density hybrid sampling (DDHS), to address imbalanced data problems. The proposed method learns an embedding network to project the data samples into a low-dimensional separable latent space. The goal is to preserve class proximity during data projection, and we use within-class and between-class concepts to devise loss functions. We propose to use density as a criterion to select minority and majority samples. Subsequently, we apply a feature-level approach to the selected minority samples and generate diverse and valid synthetic samples for the minority class. This work conducts extensive experiments to assess our proposed method and compare it with several methods. The experimental results show that the proposed method can yield promising and stable results. The proposed method is a data-level algorithm, and we combine the proposed method with the boosting technique to develop a method called DDHS-boosting. We compare DDHS-boosting with several ensemble methods, and DDHS-boosting shows promising results.
引用
收藏
页码:7065 / 7077
页数:13
相关论文
共 50 条
  • [1] HSDLM: A Hybrid Sampling With Deep Learning Method for Imbalanced Data Classification
    Hasib, Khan Md
    Towhid, Nurul Akter
    Islam, Md Rafiqul
    [J]. INTERNATIONAL JOURNAL OF CLOUD APPLICATIONS AND COMPUTING, 2021, 11 (04) : 1 - 13
  • [2] Deep Learning and Data Sampling with Imbalanced Big Data
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    [J]. 2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 175 - 183
  • [3] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    [J]. PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 202 - 207
  • [4] HYBS: A novel hybrid sampling method for learning from imbalanced data sets
    Liu, Zhiyong
    Yu, Hualong
    [J]. International Journal of Advancements in Computing Technology, 2012, 4 (10) : 281 - 288
  • [5] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    [J]. INTEGRATED COMPUTER-AIDED ENGINEERING, 2009, 16 (03) : 193 - 210
  • [6] The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data
    Justin M. Johnson
    Taghi M. Khoshgoftaar
    [J]. Information Systems Frontiers, 2020, 22 : 1113 - 1131
  • [7] The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    [J]. INFORMATION SYSTEMS FRONTIERS, 2020, 22 (05) : 1113 - 1131
  • [8] Deep Discriminative Features Learning and Sampling for Imbalanced Data Problem
    Liu, Yi-Hsun
    Liu, Chien-Liang
    Tseng, Vincent Shin-Mu
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1146 - 1151
  • [9] Hybrid probabilistic sampling with random subspace for imbalanced data learning
    Cao, Peng
    Zhao, Dazhe
    Zaiane, Osmar
    [J]. INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1089 - 1108
  • [10] A Hybrid Re-sampling Method for SVM Learning from Imbalanced Data Sets
    Li, Peng
    Qiao, Pei-Li
    Liu, Yuan-Chao
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 65 - 69