A MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification

被引:0
|
作者
Tao, Xinmin [1 ]
Zhang, Xiaohan [1 ]
Zheng, Yujia [1 ]
Qi, Lin [1 ]
Fan, Zhiting [1 ]
Huang, Shan [1 ]
机构
[1] Northeast Forestry Univ, Coll Civil Engn & Transportat, 26 Hexing Rd, Harbin 150040, Heilongjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced datasets; Classification; Over-sampling; Overlapping; Within-class imbalance; INTRUSION DETECTION; SAMPLING METHOD; SMOTE; PERFORMANCE; NOISY;
D O I
10.1016/j.ins.2024.120699
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The imbalanced data classification has gained popularity in machine learning research domain due to its prevalence in numerous applications and its difficulty. However, the majority of contemporary work primarily focuses on addressing between-class imbalance issues. Previous researches have shown that combined with other elements, such as within-class imbalance, small sample size and the presence of small disjuncts, the imbalanced data significantly increase the difficulties for the traditional classifiers to learn. Therefore, we propose a novel MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification. The proposed MeanShiftguided oversampling technique can simultaneously consider the distribution of minority class and majority class within the sphere with the current minority instance as its center, which can favor addressing small sample size and avoiding overlapping issues often caused by the nearest neighbor (NN)-based oversampling techniques. The incorporation of random vector and flexible cut-off mechanism for vector length can enhance the diversity among the generated synthetic minority instances and avoid overlapping, which makes it suitable for small sample size and small disjuncts problems. To address between-class and within-class imbalance issues, we also introduce a self-adaptive sizes assignment strategy for each minority instance to be oversampled, where the assigned size is inversely proportional to its density and its distance from the majority class. In addition to eliminating within-class imbalance, the strategy can ensure that the informative border minority instances have more opportunities to be oversampled, thus improving classification performance. Extensive experimental results on some datasets with different distributions and imbalance ratios show the proposed algorithm outperforms other compared ones with significant difference.
引用
收藏
页数:42
相关论文
共 50 条
  • [1] Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification
    Tao, Xinmin
    Guo, Xinyue
    Zheng, Yujia
    Zhang, Xiaohan
    Chen, Zhiyu
    KNOWLEDGE-BASED SYSTEMS, 2023, 277
  • [2] Adaptive Oversampling for Imbalanced Data Classification
    Ertekin, Seyda
    INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
  • [3] A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios
    Tripathi, Ayush
    Chakraborty, Rupayan
    Kopparapu, Sunil Kumar
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10650 - 10657
  • [4] Local distribution-based adaptive minority oversampling for imbalanced data classification
    Wang, Xinyue
    Xu, Jian
    Zeng, Tieyong
    Jing, Liping
    NEUROCOMPUTING, 2021, 422 : 200 - 213
  • [5] Imbalanced biomedical data classification using self-adaptive multilayer ELM combined with dynamic GAN
    Liyuan Zhang
    Huamin Yang
    Zhengang Jiang
    BioMedical Engineering OnLine, 17
  • [6] Imbalanced biomedical data classification using self-adaptive multilayer ELM combined with dynamic GAN
    Zhang, Liyuan
    Yang, Huamin
    Jiang, Zhengang
    BIOMEDICAL ENGINEERING ONLINE, 2018, 17
  • [7] A self-adaptive synthetic over-sampling technique for imbalanced classification
    Gu, Xiaowei
    Angelov, Plamen P.
    Soares, Eduardo A.
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, 35 (06) : 923 - 943
  • [8] Self-adaptive Weighted Extreme Learning Machine for Imbalanced Classification Problems
    Long, Hao
    He, Yulin
    Huang, Joshua Zhexue
    Wang, Qiang
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, 2017, 2017, 10526 : 116 - 128
  • [9] Gaussian Distribution Based Oversampling for Imbalanced Data Classification
    Xie, Yuxi
    Qiu, Min
    Zhang, Haibo
    Peng, Lizhi
    Chen, Zhenxiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (02) : 667 - 679
  • [10] Noise-robust oversampling for imbalanced data classification
    Liu, Yongxu
    Liu, Yan
    Yu, Bruce X. B.
    Zhong, Shenghua
    Hu, Zhejing
    PATTERN RECOGNITION, 2023, 133