Gaussian Distribution Based Oversampling for Imbalanced Data Classification

被引:0
|
作者
Xie, Yuxi [1 ]
Qiu, Min [1 ]
Zhang, Haibo [2 ]
Peng, Lizhi [1 ]
Chen, Zhenxiang [1 ]
机构
[1] Univ Jinan, Shandong Prov Key Lab Network Based Intelligent C, Jinan 250022, Peoples R China
[2] Univ Otago, Dept Compute Sci, Dunedin 9016, New Zealand
基金
中国国家自然科学基金;
关键词
Gaussian distribution; Data models; Adaptation models; Probabilistic logic; Internet; Cancer; Machine learning; Imbalanced learning; oversampling; probabilistic anchor selection; gaussian resampling; MACHINE; SMOTE; ENSEMBLE;
D O I
10.1109/TKDE.2020.2985965
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The imbalanced data classification problem widely exists in many real-world applications. Data resampling is a promising technique to deal with imbalanced data through either oversampling or undersampling. However, the traditional data resampling approaches simply take into account the local neighbor information to generate new instances in linear ways, leading to the generation of incorrect and unnecessary instances. In this study, we propose a new data resampling technique, namely, Gaussian Distribution based Oversampling (GDO), to handle the imbalanced data for classification. In GDO, anchor instances are selected from the minority class instances in a probabilistic way by taking into account the density and distance information carried by the minority instances. Then new minority instances are generated following a Gaussian distribution model. The proposed method is validated in experimental study by comparing with seven imbalanced learning approaches on 40 data sets from the KEEL repository and 10 large data sets from the UCI repository. Experimental results show that our method outperforms the other compared methods in terms of AUC, G-mean and memory usage with an increase in running time. We also apply GDO to deal with two real imbalanced data classification problems: Internet video traffic identification and metastasis detection of esophageal cancer. The experimental results once again validate the effectiveness of our approach.
引用
收藏
页码:667 / 679
页数:13
相关论文
共 50 条
  • [1] A Combined Priori and Purity Gaussian OverSampling Algorithm for Imbalanced Data Classification
    Tao, Liangliang
    Zhu, Huping
    Wang, Qingya
    Liang, Yage
    Deng, Xiaozheng
    [J]. IEEE ACCESS, 2023, 11 : 130688 - 130696
  • [2] Local distribution-based adaptive minority oversampling for imbalanced data classification
    Wang, Xinyue
    Xu, Jian
    Zeng, Tieyong
    Jing, Liping
    [J]. NEUROCOMPUTING, 2021, 422 : 200 - 213
  • [3] Noise-Robust Gaussian Distribution Based Imbalanced Oversampling
    Shao, Xuetao
    Yan, Yuanting
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT II, 2024, 14488 : 221 - 234
  • [4] A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification
    Xu, Zhaozhao
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3740 - 3753
  • [5] Adaptive Oversampling for Imbalanced Data Classification
    Ertekin, Seyda
    [J]. INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
  • [6] Radial-Based Oversampling for Multiclass Imbalanced Data Classification
    Krawczyk, Bartosz
    Koziarski, Michal
    Wozniak, Michal
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (08) : 2818 - 2831
  • [7] Radial-Based oversampling for noisy imbalanced data classification
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    [J]. NEUROCOMPUTING, 2019, 343 : 19 - 33
  • [8] A NOVEL RULE-BASED OVERSAMPLING APPROACH FOR IMBALANCED DATA CLASSIFICATION
    Zhang, Xiao
    Paz, Ivan
    Nebot, Angela
    [J]. 37TH ANNUAL EUROPEAN SIMULATION AND MODELLING CONFERENCE 2023, ESM 2023, 2023, : 208 - 212
  • [9] Grouping-based Oversampling in Kernel Space for Imbalanced Data Classification
    Ren, Jinjun
    Wang, Yuping
    Cheung, Yiu-ming
    Gao, Xiao-Zhi
    Guo, Xiaofang
    [J]. PATTERN RECOGNITION, 2023, 133
  • [10] Binary imbalanced data classification based on diversity oversampling by generative models
    Zhai, Junhai
    Qi, Jiaxing
    Shen, Chu
    [J]. INFORMATION SCIENCES, 2022, 585 : 313 - 343