Learning From Imbalanced Data With Deep Density Hybrid Sampling

被引：8

作者：

Liu, Chien-Liang ^{[1
]}

Chang, Yu-Hua ^{[1
]}

机构：

[1] Natl Yang Ming Chiao Tung Univ, Dept Ind Engn & Management, Hsinchu 30010, Taiwan

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2022年 / 52卷 / 11期

关键词：

Boosting; Training; Euclidean distance; Sampling methods; Costs; Hybrid power systems; Estimation; Class imbalance; embedding network; hybrid sampling; imbalanced data; synthetic data; SMOTE; CLASSIFICATION;

D O I：

10.1109/TSMC.2022.3151394

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Learning from imbalanced data is an important and challenging topic in machine learning. Many works have devised methods to cope with imbalanced data, but most methods only consider minority or majority classes without considering the relationship between the two classes. In addition, many synthetic minority oversampling technique-based methods generate synthetic samples from the original feature space and use the Euclidean distance to search for the nearest neighbors. However, the Euclidean distance is not a precise distance metric in a high-dimensional space. This article proposes a novel method, called deep density hybrid sampling (DDHS), to address imbalanced data problems. The proposed method learns an embedding network to project the data samples into a low-dimensional separable latent space. The goal is to preserve class proximity during data projection, and we use within-class and between-class concepts to devise loss functions. We propose to use density as a criterion to select minority and majority samples. Subsequently, we apply a feature-level approach to the selected minority samples and generate diverse and valid synthetic samples for the minority class. This work conducts extensive experiments to assess our proposed method and compare it with several methods. The experimental results show that the proposed method can yield promising and stable results. The proposed method is a data-level algorithm, and we combine the proposed method with the boosting technique to develop a method called DDHS-boosting. We compare DDHS-boosting with several ensemble methods, and DDHS-boosting shows promising results.

引用

页码：7065 / 7077

页数：13

共 50 条

[1] HSDLM: A Hybrid Sampling With Deep Learning Method for Imbalanced Data Classification
Hasib, Khan Md
Towhid, Nurul Akter
Islam, Md Rafiqul
[J]. INTERNATIONAL JOURNAL OF CLOUD APPLICATIONS AND COMPUTING, 2021, 11 (04) : 1 - 13
[2] Deep Learning and Data Sampling with Imbalanced Big Data
Johnson, Justin M.
Khoshgoftaar, Taghi M.
[J]. 2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 175 - 183
[3] Hybrid sampling for imbalanced data
Seiffert, Chris
Khoshgoftaar, Taghi M.
Van Hulse, Jason
[J]. PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 202 - 207
[4] HYBS: A novel hybrid sampling method for learning from imbalanced data sets
Liu, Zhiyong
Yu, Hualong
[J]. International Journal of Advancements in Computing Technology, 2012, 4 (10) : 281 - 288
[5] Hybrid sampling for imbalanced data
Seiffert, Chris
Khoshgoftaar, Taghi M.
Van Hulse, Jason
[J]. INTEGRATED COMPUTER-AIDED ENGINEERING, 2009, 16 (03) : 193 - 210
[6] The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data
Justin M. Johnson
Taghi M. Khoshgoftaar
[J]. Information Systems Frontiers, 2020, 22 : 1113 - 1131
[7] The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data
Johnson, Justin M.
Khoshgoftaar, Taghi M.
[J]. INFORMATION SYSTEMS FRONTIERS, 2020, 22 (05) : 1113 - 1131
[8] Deep Discriminative Features Learning and Sampling for Imbalanced Data Problem
Liu, Yi-Hsun
Liu, Chien-Liang
Tseng, Vincent Shin-Mu
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1146 - 1151
[9] Hybrid probabilistic sampling with random subspace for imbalanced data learning
Cao, Peng
Zhao, Dazhe
Zaiane, Osmar
[J]. INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1089 - 1108
[10] A Hybrid Re-sampling Method for SVM Learning from Imbalanced Data Sets
Li, Peng
Qiao, Pei-Li
Liu, Yuan-Chao
[J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 65 - 69

← 1 2 3 4 5 →