Predicting water quality variables using gradient boosting machine: global versus local explainability using SHapley Additive Explanations (SHAP)

被引:0
|
作者
Khaled Merabet [1 ]
Fabio Di Nunno [2 ]
Francesco Granata [2 ]
Sungwon Kim [3 ]
Rana Muhammad Adnan [4 ]
Salim Heddam [7 ]
Ozgur Kisi [1 ]
Mohammad Zounemat-Kermani [5 ]
机构
[1] Faculty of Science,Department of Civil and Mechanical Engineering (DICEM)
[2] Agronomy Department,Department of Railroad Construction and Safety Engineering
[3] Hydraulics Division,College of Architecture and Urban Planning
[4] University of Cassino and Southern Lazio,Department of Civil Engineering, School of Technology
[5] Dongyang University,Department of Civil Engineering
[6] Guangzhou University,Center for global health research
[7] IIia State University,School of Civil, Environmental and Architectural Engineering
[8] Shahid Bahonar University of Kerman,undefined
[9] Saveetha Institute of Medical and Technical sciences,undefined
[10] Korea University,undefined
关键词
Modelling; Water quality; Chl-a; DO; TU; AdaBoost; Boosting models; SHAP;
D O I
10.1007/s12145-025-01796-y
中图分类号
学科分类号
摘要
Water quality assessment is critical for ensuring the health of aquatic ecosystems and managing water resources effectively. However, accurately predicting key water quality variables remains challenging due to the complex interactions between environmental factors and anthropogenic influences. In the present investigation, a new modelling framework is proposed for better prediction of three water quality variables, namely: (i) dissolved oxygen concentration (DO), (ii) water turbidity (TU), and (iii) water Chlorophyll a (Chl-a). Six machine learning models, i.e., adaptive boosting (AdaBoost), categorical boosting (CatBoost), histogram gradient boosting (HistGBRT), light gradient boosting machine (LightGBM), natural gradient boosting (NGBoost), and extreme gradient boosting (XGBoost), both applied and compared based on the combination of a large number of water quality variables. All models were developed using data collected from three stations: (i) USGS 05543010 Illinois River at Seneca, Illinois County, (ii) USGS 05586300 Illinois River at Florence, Illinois County, and (iii) USGS 05553700 Illinois River at Starved Rock, Illinois County, USA. The SHapley additive explanations (SHAP) was adopted in the present study for model interpretability and feature ranking. Furthermore, all models were compared using various numerical indices and graphical representations. From the obtained results we can draw the following conclusion. DO concentration can be predicted very well with high numerical performances, and the CatBoost model was found to be the best one exhibiting excellent numerical index: RMSE (0.430), MAE (0.326), R (0.980) and NSE (0.961), respectively. For Chl-a, all models were found to be less accurate and the best performances were obtained using the LightGBM with RMSE (5.916), MAE (4.294), R (0.892) and NSE (0.795), respectively. Finally, for water TU, none of the models were found to be accurate and very poor performances were obtained. Finally, the use of the SHAP has significantly helped in better understanding the overall contribution of the various water variables in the finale prediction.
引用
收藏
相关论文
共 50 条
  • [1] AI for Automating Data Center Operations: Model Explainability in the Data Centre Context Using Shapley Additive Explanations (SHAP)
    Gebreyesus, Yibrah
    Dalton, Damian
    De Chiara, Davide
    Chinnici, Marta
    Chinnici, Andrea
    ELECTRONICS, 2024, 13 (09)
  • [2] Prediction of HHV of fuel by Machine learning Algorithm: Interpretability analysis using Shapley Additive Explanations (SHAP)
    Timilsina, Manish Sharma
    Sen, Subhadip
    Uprety, Bibek
    Patel, Vashishtha B.
    Sharma, Prateek
    Sheth, Pratik N.
    FUEL, 2024, 357
  • [3] Prediction of HHV of fuel by Machine learning Algorithm: Interpretability analysis using Shapley Additive Explanations (SHAP)
    Timilsina, Manish Sharma
    Sen, Subhadip
    Uprety, Bibek
    Patel, Vashishtha B.
    Sharma, Prateek
    Sheth, Pratik N.
    FUEL, 2024, 357
  • [4] Predicting and analyzing flood susceptibility using boosting-based ensemble machine learning algorithms with SHapley Additive exPlanations
    Aydin, Halit Enes
    Iban, Muzaffer Can
    NATURAL HAZARDS, 2023, 116 (03) : 2957 - 2991
  • [5] Predicting and analyzing flood susceptibility using boosting-based ensemble machine learning algorithms with SHapley Additive exPlanations
    Halit Enes Aydin
    Muzaffer Can Iban
    Natural Hazards, 2023, 116 : 2957 - 2991
  • [6] Using Shapley additive explanations to interpret extreme gradient boosting predictions of grassland degradation in Xilingol, China
    Batunacun
    Wieland, Ralf
    Lakes, Tobia
    Nendel, Claas
    GEOSCIENTIFIC MODEL DEVELOPMENT, 2021, 14 (03) : 1493 - 1510
  • [7] Bankruptcy prediction using machine learning and Shapley additive explanations
    Nguyen, Hoang Hiep
    Viviani, Jean-Laurent
    Ben Jabeur, Sami
    REVIEW OF QUANTITATIVE FINANCE AND ACCOUNTING, 2023,
  • [8] Evaluating the importance of vertical environmental variables for albacore fishing grounds in tropical Atlantic Ocean using machine learning and Shapley additive explanations (SHAP) approach
    Zhang, Tianjiao
    Guo, Hu
    Song, Liming
    Yuan, Hongchun
    Sui, Hengshou
    Li, Bin
    FISHERIES OCEANOGRAPHY, 2024,
  • [9] Explainable Risk Assessment of Rockbolts' Failure in Underground Coal Mines Based on Categorical Gradient Boosting and SHapley Additive exPlanations (SHAP)
    Ibrahim, Bemah
    Ahenkorah, Isaac
    Ewusi, Anthony
    SUSTAINABILITY, 2022, 14 (19)
  • [10] Assessment of the Impact of Meteorological Variables on Lake Water Temperature Using the SHapley Additive exPlanations Method
    Amnuaylojaroen, Teerachai
    Ptak, Mariusz
    Sojka, Mariusz
    Water (Switzerland), 2024, 16 (22)