A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data

被引:0
|
作者
Amir Reza Salehi
Majid Khedmati
机构
[1] Sharif University of Technology,Department of Industrial Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, a Cluster-based Synthetic minority oversampling technique (SMOTE) Both-sampling (CSBBoost) ensemble algorithm is proposed for classifying imbalanced data. In this algorithm, a combination of over-sampling, under-sampling, and different ensemble algorithms, including Extreme Gradient Boosting (XGBoost), random forest, and bagging, is employed in order to achieve a balanced dataset and address the issues including redundancy of data after over-sampling, information loss in under-sampling, and random sample selection for sampling and sample generation. The performance of the proposed algorithm is evaluated and compared to different state-of-the-art competing algorithms based on 20 benchmark imbalanced datasets in terms of the harmonic mean of precision and recall (F1) and area under the receiver operating characteristics curve (AUC) measures. Based on the results, the proposed CSBBoost algorithm performs significantly better than the competing algorithms. In addition, a real-world dataset is used to demonstrate the applicability of the proposed algorithm.
引用
收藏
相关论文
共 50 条
  • [21] Research on expansion and classification of imbalanced data based on SMOTE algorithm
    Shujuan Wang
    Yuntao Dai
    Jihong Shen
    Jingxue Xuan
    Scientific Reports, 11
  • [22] Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE
    Chen, Junfeng
    Zheng, Zhongtuan
    Computer Engineering and Applications, 2024, 57 (23) : 106 - 112
  • [23] A Cluster-based Regrouping Approach for Imbalanced Data Distributions
    Yu, Wen
    Jiang, ShengYi
    2012 WORLD AUTOMATION CONGRESS (WAC), 2012,
  • [24] Research on expansion and classification of imbalanced data based on SMOTE algorithm
    Wang, Shujuan
    Dai, Yuntao
    Shen, Jihong
    Xuan, Jingxue
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [25] Cluster-Based Instance Selection for the Imbalanced Data Classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT II, 2018, 11056 : 191 - 200
  • [26] CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
    Rayhan, Farshid
    Ahmed, Sajid
    Mahbub, Asif
    Jani, Md. Rafsan
    Shatabda, Swakkhar
    Farid, Dewan Md.
    2017 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTION (CSITSS-2017), 2017, : 70 - 75
  • [27] Classifying imbalanced data using SMOTE based class-specific kernelized ELM
    Raghuwanshi, Bhagat Singh
    Shukla, Sanyam
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (05) : 1255 - 1280
  • [28] Classifying imbalanced data using SMOTE based class-specific kernelized ELM
    Bhagat Singh Raghuwanshi
    Sanyam Shukla
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 1255 - 1280
  • [29] A method of classifying imbalanced credit data based on the AC-CTGAN hybrid sampling algorithm
    Chen, Tinggui
    Gu, Hailian
    Yang, Zhiyu
    Yang, Jianjun
    Wang, Bing
    JOURNAL OF CREDIT RISK, 2024, 20 (03):
  • [30] A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data
    Xu, Zhaozhao
    Shen, Derong
    Nie, Tiezheng
    Kou, Yue
    JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 107