A MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification

被引：0

作者：

Tao, Xinmin ^{[1
]}

Zhang, Xiaohan ^{[1
]}

Zheng, Yujia ^{[1
]}

Qi, Lin ^{[1
]}

Fan, Zhiting ^{[1
]}

Huang, Shan ^{[1
]}

机构：

[1] Northeast Forestry Univ, Coll Civil Engn & Transportat, 26 Hexing Rd, Harbin 150040, Heilongjiang, Peoples R China

来源：

INFORMATION SCIENCES | 2024年 / 672卷

基金：

中国国家自然科学基金;

关键词：

Imbalanced datasets; Classification; Over-sampling; Overlapping; Within-class imbalance; INTRUSION DETECTION; SAMPLING METHOD; SMOTE; PERFORMANCE; NOISY;

D O I：

10.1016/j.ins.2024.120699

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The imbalanced data classification has gained popularity in machine learning research domain due to its prevalence in numerous applications and its difficulty. However, the majority of contemporary work primarily focuses on addressing between-class imbalance issues. Previous researches have shown that combined with other elements, such as within-class imbalance, small sample size and the presence of small disjuncts, the imbalanced data significantly increase the difficulties for the traditional classifiers to learn. Therefore, we propose a novel MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification. The proposed MeanShiftguided oversampling technique can simultaneously consider the distribution of minority class and majority class within the sphere with the current minority instance as its center, which can favor addressing small sample size and avoiding overlapping issues often caused by the nearest neighbor (NN)-based oversampling techniques. The incorporation of random vector and flexible cut-off mechanism for vector length can enhance the diversity among the generated synthetic minority instances and avoid overlapping, which makes it suitable for small sample size and small disjuncts problems. To address between-class and within-class imbalance issues, we also introduce a self-adaptive sizes assignment strategy for each minority instance to be oversampled, where the assigned size is inversely proportional to its density and its distance from the majority class. In addition to eliminating within-class imbalance, the strategy can ensure that the informative border minority instances have more opportunities to be oversampled, thus improving classification performance. Extensive experimental results on some datasets with different distributions and imbalance ratios show the proposed algorithm outperforms other compared ones with significant difference.

引用

页数：42

共 50 条

[1] Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification
Tao, Xinmin
Guo, Xinyue
Zheng, Yujia
Zhang, Xiaohan
Chen, Zhiyu
KNOWLEDGE-BASED SYSTEMS, 2023, 277
[2] Adaptive Oversampling for Imbalanced Data Classification
Ertekin, Seyda
INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
[3] A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios
Tripathi, Ayush
Chakraborty, Rupayan
Kopparapu, Sunil Kumar
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10650 - 10657
[4] Local distribution-based adaptive minority oversampling for imbalanced data classification
Wang, Xinyue
Xu, Jian
Zeng, Tieyong
Jing, Liping
NEUROCOMPUTING, 2021, 422 : 200 - 213
[5] Imbalanced biomedical data classification using self-adaptive multilayer ELM combined with dynamic GAN
Liyuan Zhang
Huamin Yang
Zhengang Jiang
BioMedical Engineering OnLine, 17
[6] Imbalanced biomedical data classification using self-adaptive multilayer ELM combined with dynamic GAN
Zhang, Liyuan
Yang, Huamin
Jiang, Zhengang
BIOMEDICAL ENGINEERING ONLINE, 2018, 17
[7] A self-adaptive synthetic over-sampling technique for imbalanced classification
Gu, Xiaowei
Angelov, Plamen P.
Soares, Eduardo A.
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, 35 (06) : 923 - 943
[8] Self-adaptive Weighted Extreme Learning Machine for Imbalanced Classification Problems
Long, Hao
He, Yulin
Huang, Joshua Zhexue
Wang, Qiang
TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, 2017, 2017, 10526 : 116 - 128
[9] Gaussian Distribution Based Oversampling for Imbalanced Data Classification
Xie, Yuxi
Qiu, Min
Zhang, Haibo
Peng, Lizhi
Chen, Zhenxiang
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (02) : 667 - 679
[10] Noise-robust oversampling for imbalanced data classification
Liu, Yongxu
Liu, Yan
Yu, Bruce X. B.
Zhong, Shenghua
Hu, Zhejing
PATTERN RECOGNITION, 2023, 133

← 1 2 3 4 5 →