PF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets

被引:26
|
作者
Chen, Qiong [1 ]
Zhang, Zhong-Liang [1 ,2 ,3 ]
Huang, Wen-Po [1 ]
Wu, Jian [1 ,3 ]
Luo, Xing-Gang [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Management, Hangzhou 310018, Peoples R China
[2] Shanghai Jiao Tong Univ, Antai Coll Econ & Management, Shanghai 200030, Peoples R China
[3] Hangzhou Dianzi Univ, Res Ctr Youth Publ Opin Zhejiang, Hangzhou 310018, Peoples R China
基金
美国国家科学基金会;
关键词
Imbalanced datasets; Data preprocessing; SMOTE; Gaussian process; Oversampling; OVERSAMPLING TECHNIQUE; SAMPLING APPROACH; DATA-SETS; CLASSIFICATION; NOISY; TREES;
D O I
10.1016/j.neucom.2022.05.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance learning is one of the most important topics in the field of machine learning and data mining, and the Synthetic Minority Oversampling Techniques (SMOTE) is the common method to handle this issue. The main shortcomings of the classic SMOTE and its variants is the interpolation of potential noise and unrepresentative examples. This paper is devoted to proposing a novel parameter-free SMOTE mechanism to produce sufficient representative synthetic examples while avoiding interpolating noisy examples. Specifically, two types of minority class examples are defined, namely boundary and safe minority examples. The synthetic examples generation procedure fully reflects the characteristics of the minority class examples with filling the region dominated by the minority class and expanding the margin of the minority class. To verify the effectiveness and robustness of the proposed method, a thorough experimental study on forty datasets selected from real-world applications is carried out. The experimental results indicate that our proposed method is competitive to the classic SMOTE and its state-of-the-art variants. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:75 / 88
页数:14
相关论文
共 50 条
  • [21] Parameter-Free Imputation for Imbalance Datasets
    Takum, Jintana
    Bunkhumpornpat, Chumphol
    [J]. EMERGENCE OF DIGITAL LIBRARIES - RESEARCH AND PRACTICES, 2014, 8839 : 260 - 267
  • [22] A Preliminary Study of SMOTE on Imbalanced Big Datasets When Dealing with Sparse and Dense High Dimensionality
    Bolivar, A.
    Garcia, V
    Florencia, R.
    Alejo, R.
    Rivera, G.
    Patricia Sanchez-Solis, J.
    [J]. PATTERN RECOGNITION, MCPR 2022, 2022, 13264 : 46 - 55
  • [23] Adaptive Weighting with SMOTE for Learning from Imbalanced Datasets: A Case Study for Traffic Offence Prediction
    Bobbili, Naga Prasanthi
    Cretu, Ana-Maria
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND VIRTUAL ENVIRONMENTS FOR MEASUREMENT SYSTEMS AND APPLICATIONS (CIVEMSA), 2018,
  • [24] Impact of Nature of Medical Data on Machine and Deep Learning for Imbalanced Datasets: Clinical Validity of SMOTE Is Questionable
    Gholampour, Seifollah
    [J]. MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (02): : 827 - 841
  • [25] Parameter-Free Extreme Learning Machine for Imbalanced Classification
    Li, Li
    Zhao, Kaiyi
    Sun, Ruizhi
    Gan, Jiangzhang
    Yuan, Gang
    Liu, Tong
    [J]. NEURAL PROCESSING LETTERS, 2020, 52 (03) : 1927 - 1944
  • [26] Parameter-Free Extreme Learning Machine for Imbalanced Classification
    Li Li
    Kaiyi Zhao
    Ruizhi Sun
    Jiangzhang Gan
    Gang Yuan
    Tong Liu
    [J]. Neural Processing Letters, 2020, 52 : 1927 - 1944
  • [27] A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM
    Wang, Qi
    Luo, ZhiHao
    Huang, JinCai
    Feng, YangHe
    Liu, Zhong
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2017, 2017
  • [28] A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors
    Li, Junnan
    Zhu, Qingsheng
    Wu, Quanwang
    Fan, Zhu
    [J]. INFORMATION SCIENCES, 2021, 565 : 438 - 455
  • [29] Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms
    Sowjanya, A. Mary
    Mrudula, Owk
    [J]. APPLIED NANOSCIENCE, 2022, 13 (3) : 1829 - 1840
  • [30] Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms
    A. Mary Sowjanya
    Owk Mrudula
    [J]. Applied Nanoscience, 2023, 13 : 1829 - 1840