Consensus Clustering-Based Undersampling Approach to Imbalanced Learning

被引:132
|
作者
Onan, Aytug [1 ]
机构
[1] Izmir Katip Celebi Univ, Fac Engn & Architecture, Dept Comp Engn, TR-35620 Izmir, Turkey
关键词
ALGORITHM;
D O I
10.1155/2019/5901087
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small number of instances and the other class (referred as, the majority class) has immense quantity of instances. Imbalanced datasets can be of great importance in several real-world applications, including medical diagnosis, malware detection, anomaly identification, bankruptcy prediction, and spam filtering. In this paper, we present a consensus clustering based-undersampling approach to imbalanced learning. In this scheme, the number of instances in the majority class was undersampled by utilizing a consensus clustering-based scheme. In the empirical analysis, 44 small-scale and 2 large-scale imbalanced classification benchmarks have been utilized. In the consensus clustering schemes, five clustering algorithms (namely, k-means, k-modes, k-means++, self-organizing maps, and DIANA algorithm) and their combinations were taken into consideration. In the classification phase, five supervised learning methods (namely, naive Bayes, logistic regression, support vector machines, random forests, and k-nearest neighbor algorithm) and three ensemble learner methods (namely, AdaBoost, bagging, and random subspace algorithm) were utilized. The empirical results indicate that the proposed heterogeneous consensus clustering-based undersampling scheme yields better predictive performance.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Clustering-based undersampling in class-imbalanced data
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Hu, Ya-Han
    Jhang, Jing-Shang
    [J]. INFORMATION SCIENCES, 2017, 409 : 17 - 26
  • [2] Clustering Based Undersampling for Effective Learning from Imbalanced Data: An Iterative Approach
    Bhattacharya R.
    De R.
    Chakraborty A.
    Sarkar R.
    [J]. SN Computer Science, 5 (4)
  • [3] EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification
    Hoang Lam Le
    Landa-Silva, Dario
    Galar, Mikel
    Garcia, Salvador
    Triguero, Isaac
    [J]. APPLIED SOFT COMPUTING, 2021, 101
  • [4] Clustering-based incremental learning for imbalanced data classification
    Liu, Yuxin
    Du, Guangyu
    Yin, Chenke
    Zhang, Haichao
    Wang, Jia
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 292
  • [5] Clustering-based incremental learning for imbalanced data classification
    Liu, Yuxin
    Du, Guangyu
    Yin, Chenke
    Zhang, Hachao
    Wang, Jia
    [J]. Knowledge-Based Systems, 2024, 292
  • [6] CLUSTERING-BASED SUBSET ENSEMBLE LEARNING METHOD FOR IMBALANCED DATA
    Hu, Xiao-Sheng
    Zhang, Run-Jing
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 35 - 39
  • [7] DYCUSBoost: Adaboost-based imbalanced learning using dynamic clustering and undersampling
    Chen, Lingchi
    Deng, Xiaoheng
    Shen, Hailan
    Zhu, Congxu
    Chang, Le
    [J]. 2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 208 - 215
  • [8] Imbalanced credit card fraud detection data: A solution based on hybrid neural network and clustering-based undersampling technique
    Huang, Huajie
    Liu, Bo
    Xue, Xiaoyu
    Cao, Jiuxin
    Chen, Xinyi
    [J]. APPLIED SOFT COMPUTING, 2024, 154
  • [9] Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach
    Sanchez-Hernandez, Fernando
    Carlos Ballesteros-Herraez, Juan
    Kraiem, Mohamed S.
    Sanchez-Barba, Mercedes
    Moreno-Garcia, Maria N.
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (24):
  • [10] Adaptive Clustering-Based Model Aggregation for Federated Learning with Imbalanced Data
    Wang, Dong
    Zhang, Naifu
    Tao, Meixia
    [J]. SPAWC 2021: 2021 IEEE 22ND INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (IEEE SPAWC 2021), 2020, : 591 - 595