An improved and random synthetic minority oversampling technique for imbalanced data

被引:28
|
作者
Wei, Guoliang [1 ]
Mu, Weimeng [1 ]
Song, Yan [2 ]
Dou, Jun [2 ]
机构
[1] Univ Shanghai Sci & Technol, Sch Sci, Shanghai 200093, Peoples R China
[2] Univ Shanghai Sci & Technol, Dept Control Sci & Engn, Shanghai 200093, Peoples R China
基金
上海市自然科学基金;
关键词
Imbalanced data; Improved and random SMOTE; K-means algorithm; Synthesis strategy; Kernel density estimation; SAMPLING APPROACH; SMOTE; CLASSIFICATION;
D O I
10.1016/j.knosys.2022.108839
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced data learning has become a major challenge in data mining and machine learning. Oversampling is an effective way to re-achieve the balance by generating new samples. However, most oversampling methods cannot perform well in the presence of noises and complicated distribution structures, very easy to generate redundant/unsafe/outlier samples. To handle this problem, we endeavor to propose a novel oversampling method, namely Improved and Random Synthetic Minority Oversampling Technique (IR-SMOTE). The core idea of IR-SMOTE is three-fold: (1) by applying an ascending operation to sort the majority class samples, noise samples in each cluster of minority class after k-means clustering are successfully removed; (2) the number of synthetic samples is adaptively assigned to each cluster in minority class by means of the kernel density estimation technique; and (3) based on the obtained attributes of the temporary synthetic samples in terms of random-SMOTE, a new synthesizing method is developed to generate new samples with a guaranteed diversity. Finally, many comparison experiments have been carried out on 18 well-known data sets, which illustrate the effectiveness and universal applicability of the proposed IR-SMOTE method for imbalanced data classification.(C) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
    Jin, Dian
    Xie, Dehong
    Liu, Di
    Gong, Murong
    [J]. INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652
  • [2] A Novel Synthetic Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Murase, Kazuyuki
    [J]. NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 735 - +
  • [3] Performance of Synthetic Minority Oversampling Technique on Imbalanced Breast Cancer Data
    Rani, K. Usha
    Ramadevi, G. Naga
    Lavanya, D.
    [J]. PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 1623 - 1627
  • [4] A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios
    Tripathi, Ayush
    Chakraborty, Rupayan
    Kopparapu, Sunil Kumar
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10650 - 10657
  • [5] CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification
    Elyan, Eyad
    Moreno-Garcia, Carlos Francisco
    Jayne, Chrisina
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (07): : 2839 - 2851
  • [6] CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification
    Elyan, Eyad
    Moreno-Garcia, Carlos Francisco
    Jayne, Chrisina
    [J]. Neural Computing and Applications, 2021, 33 (07) : 2839 - 2851
  • [7] Learning class-imbalanced data with region-impurity synthetic minority oversampling technique
    Li, Der -Chiang
    Wang, Ssu-Yang
    Huang, Kuan-Cheng
    Tsai, Tung -, I
    [J]. INFORMATION SCIENCES, 2022, 607 : 1391 - 1407
  • [8] A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification
    Xu, Zhaozhao
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3740 - 3753
  • [9] CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification
    Eyad Elyan
    Carlos Francisco Moreno-Garcia
    Chrisina Jayne
    [J]. Neural Computing and Applications, 2021, 33 : 2839 - 2851
  • [10] A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning
    Elreedy, Dina
    Atiya, Amir F.
    Kamalov, Firuz
    [J]. MACHINE LEARNING, 2024, 113 (07) : 4903 - 4923