A No Parameter Synthetic Minority Oversampling Technique Based on Finch for Imbalanced Data

被引:1
|
作者
Xu, Shoukun [1 ]
Li, Zhibang [1 ]
Yuan, Baohua [1 ]
Yang, Gaochao [1 ]
Wang, Xueyuan [1 ]
Li, Ning [1 ]
机构
[1] Changzhou Univ, Coll Comp & Artificial Intelligence, Changzhou 213164, Jiangsu, Peoples R China
关键词
SMOTE; FINCH algorithm; Synthesis strategy; SAMPLING METHOD; SMOTE;
D O I
10.1007/978-981-99-4752-2_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The synthetic minority oversampling technique(SMOTE) has emerged as a significant approach to address class imbalance challenges in machine learning. However, the algorithm is afflicted by challenges such as the imbalanced distribution of minority class data and concerns regarding the quality of synthetic data. The enhanced variants combined with the clustering algorithm encounter the problems such as difficulty in determining the optimal value of hyperparameters and class overlap. So this paper proposes a new improved algorithm named NP-SMOTE. The core concept of the algorithm is as follows: initially, the FINCH algorithm is employed to cluster the minority class data into distinct clusters. Subsequently, the data within each cluster are categorized into boundary data and central data by determining the class of nearest neighbors for each minority class data. Finally, the appropriate synthesis methods are applied to generate data for these two classes of minority class data. This algorithm obviates the need for predetermined hyperparameters and circumvents the limitations of class overlap by synthesizing data from various classes in a customized manner. The algorithm exhibits robustness and superior generalizability as demonstrated by their comparison with commonly used algorithms across 6 datasets.
引用
收藏
页码:367 / 378
页数:12
相关论文
共 50 条
  • [41] Synthetic minority oversampling in addressing imbalanced sarcasm detection in social media
    Banerjee, Arghasree
    Bhattacharjee, Mayukh
    Ghosh, Kushankur
    Chatterjee, Sankhadeep
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (47-48) : 35995 - 36031
  • [42] Effect of Synthetic Minority Oversampling Technique (SMOTE), Feature Representation, and Classification Algorithm on Imbalanced Sentiment Analysis
    Satriaji, Widi
    Kusumaningrum, Retno
    2018 2ND INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS), 2018, : 99 - 103
  • [43] Counterfactual-based minority oversampling for imbalanced classification
    Wang, Shu
    Luo, Hao
    Huang, Shanshan
    Li, Qingsong
    Liu, Li
    Su, Guoxin
    Liu, Ming
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [44] A minority oversampling approach for fault detection with heterogeneous imbalanced data
    Liu, Jie
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184
  • [45] Identify essential genes based on clustering based synthetic minority oversampling technique
    Shi, Hua
    Wu, Chenjin
    Bai, Tao
    Chen, Jiahai
    Li, Yan
    Wu, Hao
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 153
  • [46] Multiple Kernel Learning With Minority Oversampling for Classifying Imbalanced Data
    Wang, Ling
    Wang, Hongqiao
    Fu, Guangyuan
    IEEE ACCESS, 2021, 9 : 565 - 580
  • [47] Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data
    Ren, Ruonan
    Yang, Youlong
    Sun, Liqin
    APPLIED INTELLIGENCE, 2020, 50 (08) : 2465 - 2487
  • [48] A new instance density-based synthetic minority oversampling method for imbalanced classification problems
    Ma, Chung-Kang
    Park, You-Jin
    ENGINEERING OPTIMIZATION, 2022, 54 (10) : 1743 - 1757
  • [49] Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data
    Ruonan Ren
    Youlong Yang
    Liqin Sun
    Applied Intelligence, 2020, 50 : 2465 - 2487
  • [50] Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning From Imbalanced Data
    Pradipta, Gede Angga
    Wardoyo, Retantyo
    Musdholifah, Aina
    Sanjaya, I. Nyoman Hariyasa
    IEEE ACCESS, 2021, 9 : 74763 - 74777