PF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets

被引:26
|
作者
Chen, Qiong [1 ]
Zhang, Zhong-Liang [1 ,2 ,3 ]
Huang, Wen-Po [1 ]
Wu, Jian [1 ,3 ]
Luo, Xing-Gang [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Management, Hangzhou 310018, Peoples R China
[2] Shanghai Jiao Tong Univ, Antai Coll Econ & Management, Shanghai 200030, Peoples R China
[3] Hangzhou Dianzi Univ, Res Ctr Youth Publ Opin Zhejiang, Hangzhou 310018, Peoples R China
基金
美国国家科学基金会;
关键词
Imbalanced datasets; Data preprocessing; SMOTE; Gaussian process; Oversampling; OVERSAMPLING TECHNIQUE; SAMPLING APPROACH; DATA-SETS; CLASSIFICATION; NOISY; TREES;
D O I
10.1016/j.neucom.2022.05.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance learning is one of the most important topics in the field of machine learning and data mining, and the Synthetic Minority Oversampling Techniques (SMOTE) is the common method to handle this issue. The main shortcomings of the classic SMOTE and its variants is the interpolation of potential noise and unrepresentative examples. This paper is devoted to proposing a novel parameter-free SMOTE mechanism to produce sufficient representative synthetic examples while avoiding interpolating noisy examples. Specifically, two types of minority class examples are defined, namely boundary and safe minority examples. The synthetic examples generation procedure fully reflects the characteristics of the minority class examples with filling the region dominated by the minority class and expanding the margin of the minority class. To verify the effectiveness and robustness of the proposed method, a thorough experimental study on forty datasets selected from real-world applications is carried out. The experimental results indicate that our proposed method is competitive to the classic SMOTE and its state-of-the-art variants. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:75 / 88
页数:14
相关论文
共 50 条
  • [1] A Parameter-Free Cleaning Method for SMOTE in Imbalanced Classification
    Yan, Yuanting
    Liu, Ruiqing
    Ding, Zihan
    Du, Xiuquan
    Chen, Jie
    Zhang, Yanping
    [J]. IEEE ACCESS, 2019, 7 : 23537 - 23548
  • [2] A-SMOTE: A New Preprocessing Approach for Highly Imbalanced Datasets by Improving SMOTE
    Ahmed Saad Hussein
    Tianrui Li
    Chubato Wondaferaw Yohannese
    Kamal Bashir
    [J]. International Journal of Computational Intelligence Systems, 2019, 12 : 1412 - 1422
  • [3] A-SMOTE: A New Preprocessing Approach for Highly Imbalanced Datasets by Improving SMOTE
    Hussein, Ahmed Saad
    Li, Tianrui
    Yohannese, Chubato Wondaferaw
    Bashir, Kamal
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2019, 12 (02) : 1412 - 1422
  • [4] Geometric SMOTE for imbalanced datasets with nominal and continuous features
    Fonseca, Joao
    Bacao, Fernando
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
  • [5] Learning imbalanced datasets based on SMOTE and Gaussian distribution
    Pan, Tingting
    Zhao, Junhong
    Wu, Wei
    Yang, Jie
    [J]. INFORMATION SCIENCES, 2020, 512 : 1214 - 1233
  • [6] A Modified Borderline Smote with Noise Reduction in Imbalanced Datasets
    Revathi, M.
    Ramyachitra, D.
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2021, 121 (03) : 1659 - 1680
  • [7] A Modified Borderline Smote with Noise Reduction in Imbalanced Datasets
    M. Revathi
    D. Ramyachitra
    [J]. Wireless Personal Communications, 2021, 121 : 1659 - 1680
  • [8] Kernel-Based SMOTE for SVM Classification of Imbalanced Datasets
    Mathew, Josey
    Luo, Ming
    Pang, Chee Khiang
    Chan, Hian Leng
    [J]. IECON 2015 - 41ST ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2015, : 1127 - 1132
  • [9] Applying Threshold SMOTE Algorithm with Attribute Bagging to Imbalanced Datasets
    Wang, Jin
    Yun, Bo
    Huang, Pingli
    Liu, Yu-Ao
    [J]. ROUGH SETS AND KNOWLEDGE TECHNOLOGY: 8TH INTERNATIONAL CONFERENCE, 2013, 8171 : 221 - 228
  • [10] Combination Approach of SMOTE and Biased-SVM for Imbalanced Datasets
    Wang He-Yong
    [J]. 2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 228 - 231