Effects of class imbalance on resampling and ensemble learning for improved prediction of cyanobacteria blooms

被引:32
|
作者
Shin, Jihoon [1 ]
Yoon, Seonghyeon [2 ]
Kim, YoungWoo [1 ]
Kim, Taeho [1 ]
Go, ByeongGeon [1 ]
Cha, YoonKyung [1 ]
机构
[1] Univ Seoul, Sch Environm Engn, Seoul 130743, South Korea
[2] Natl Inst Environm Res, Total Load Management Ctr, Hwangyeong Ro 42, Incheon 22689, South Korea
关键词
Class imbalance; Imbalance ratio; Resampling; SMOTE; Ensemble classifier; Cyanobacteria blooms; DATA CLASSIFICATION; WATER TEMPERATURE; ALGAL BLOOMS; FRESH-WATER; IMPUTATION; IMPACT; RIVER; TIME; EUTROPHICATION; CLASSIFIERS;
D O I
10.1016/j.ecoinf.2020.101202
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
This study aimed to explicitly explore the effects of the degree of class imbalance on predicting infrequently occurring events, i.e., cyanobacteria blooms. Although class imbalance poses a major issue in binary classification schemes, few efforts have been made to relate model performance with real-life applications. The data utilized herein were collected from 2013 to 2019 at 13 sites within three major rivers in South Korea; a variety of physicochemical and hydrometeorological factors were obtained as input variables, and the occurrence of cyanobacteria blooms (indicated by a cell count ? 1000 cells/mL) was included as a response variable. The imbalance ratio (IR) for cyanobacteria blooms differed significantly by site, ranging widely from 0.93 to 9.32. The study results suggested that class imbalance negatively affected model performance, with an increase in the IR significantly increasing the false negative (FN) rate. The application of resampling decreased the FN rate while simultaneously increasing the true positive (TP) rate, which yielded improvements that tended to increase with increasing IRs. Ensemble classifiers, which combine multiple single classifiers into an integrated classifier, alone could not successfully address the class imbalance problem; however, in combination with resampling, they consistently outperformed single classifiers. Among the ensemble classifiers, AdaBoost yielded the most stable performance across a range of IRs, irrespective of the resampling application. A variable importance analysis indicated that temperature was usually the primary influencing factor of cyanobacteria blooms. These findings highlight the effectiveness of resampling applications for addressing class imbalance, while providing useful guidelines for learning from imbalance data, including the selection of classification algorithms and model evaluation metrics.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Resampling-Based Ensemble Methods for Online Class Imbalance Learning
    Wang, Shuo
    Minku, Leandro L.
    Yao, Xin
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (05) : 1356 - 1368
  • [2] An Ensemble Learning Approach with Gradient Resampling for Class-Imbalance Problems
    Zhao, Hongke
    Zhao, Chuang
    Zhang, Xi
    Liu, Nanlin
    Zhu, Hengshu
    Liu, Qi
    Xiong, Hui
    [J]. INFORMS JOURNAL ON COMPUTING, 2023, 35 (04) : 747 - 763
  • [3] Improved hybrid resampling and ensemble model for imbalance learning and credit evaluation
    Kou, Gang
    Chen, Hao
    Hefni, Mohammed A.
    [J]. JOURNAL OF MANAGEMENT SCIENCE AND ENGINEERING, 2022, 7 (04) : 511 - 529
  • [4] Prediction of rhinitis with class imbalance based on heterogeneous ensemble learning
    Yang, Jingdong
    Jiang, Biao
    Qiu, Zehao
    Meng, Yifei
    Zhang, Xiaolin
    Yu, Shaoqing
    Dai, Fu
    Qian, Yue
    [J]. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2024,
  • [5] Prediction of cyanobacteria blooms in the lower Han River (South Korea) using ensemble learning algorithms
    Shin, Jihoon
    Yoon, Seonghyeon
    Cha, Yoonkyung
    [J]. DESALINATION AND WATER TREATMENT, 2017, 84 : 31 - 39
  • [6] Queue-Based Resampling for Online Class Imbalance Learning
    Malialis, Kleanthis
    Panayiotou, Christos
    Polycarpou, Marios M.
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 498 - 507
  • [7] Distribution Based Ensemble for Class Imbalance Learning
    Mustafa, Ghulam
    Niu, Zhendong
    Yousif, Abdallah
    Tarus, John
    [J]. FIFTH INTERNATIONAL CONFERENCE ON THE INNOVATIVE COMPUTING TECHNOLOGY (INTECH 2015), 2015, : 5 - 10
  • [8] Unsupervised Ensemble Learning for Class Imbalance Problems
    Liu, Zihan
    Wu, Dongrui
    [J]. 2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 3593 - 3600
  • [9] Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction
    Zhao, Zixue
    Cui, Tianxiang
    Ding, Shusheng
    Li, Jiawei
    Bellotti, Anthony Graham
    [J]. MATHEMATICS, 2024, 12 (05)
  • [10] Leveraging Imbalance and Ensemble Learning Methods for Improved Load Prediction in Cloud Computing Systems
    Daraghmeh, Mustafa
    Agarwal, Anjali
    Jararweh, Yaser
    [J]. IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 1687 - 1692