Two-stage meta-ensembling machine learning model for enhanced water quality forecasting

被引:3
|
作者
Heydari, Sepideh [1 ]
Nikoo, Mohammad Reza [2 ]
Mohammadi, Ali [3 ]
Barzegar, Rahim [4 ]
机构
[1] Univ Tehran, Fac Environm Engn, Dept Environm Engn, Tehran, Iran
[2] Sultan Qaboos Univ, Dept Civil & Architectural Engn, Muscat, Oman
[3] Sharif Univ Technol, Dept Ind Engn, Tehran, Iran
[4] Univ Quebec Abitibi Temiscamingue UQAT, Res Inst Mines & Environm RIME, Groundwater Res Grp GRES, Amos, PQ, Canada
关键词
Water quality forecasting; Machine learning; Multi-objective optimization; Genetic algorithm; Grey Wolf Optimizer; Chlorophyll-a and Dissolved oxygen; PREDICTION; IMPLEMENTATION; DECOMPOSITION; PERFORMANCE; STREAMFLOW; RESOURCES; SYSTEM;
D O I
10.1016/j.jhydrol.2024.131767
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Accurate short-term forecasting of water quality variables (WQVs) such as dissolved oxygen (DO) and chlorophyll-a (Chl-a) is crucial for the effective management of aquatic resources. This study introduces a robust two-stage optimization-ensembling framework that integrates the Grey Wolf Optimizer (GWO) and the Nondominated Sorting Genetic Algorithm II (NSGA-II) to enhance the forecasting capabilities of machine learning (ML) models. Focusing on Small Prespa Lake, Greece, we implemented an array of diverse ML techniques, including eXtreme Gradient Boosting (XGB), Gradient Boosting Regressor (GBR), Light Gradient-Boosting Machine (LightGBM), and Multilayer Perceptron (MLP). These models were fine-tuned using GWO to optimize their performance over critical WQVs predicted six hours in advance. Our methodology employed rigorous data preprocessing techniques, including lag time feature engineering and principal component analysis (PCA), to handle the high dimensionality of the dataset. Optimal lag times ranging from 6 to 24 hour were evaluated, with the 24-hour lag proving to be the most effective in utilizing historical data to enhance forecasting accuracy. The GWO not only facilitated hyperparameter tuning but also demonstrated a notable improvement (7.6%) in the Kling-Gupta Efficiency (KGE) over conventional randomized search methods. Subsequently, the NSGA-II was utilized for multi-objective optimization, constructing powerful model ensembles that outperformed the individual GWO-optimized models by up to a 7% in KGE. In comparision to a standard genetic algorithm-based ensemble, the NSGA-II ensemble demonstrated superior effectiveness in balancing solution quality. This innovative approach not only establishes a new benchmark in water quality forecasting but also contributes substantially to proactive environmental monitoring and management strategies.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Two-stage machine learning model for guideline development
    Mani, S
    Shankle, WR
    Dick, MB
    Pazzani, MJ
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 1999, 16 (01) : 51 - 71
  • [2] Combining two-stage decomposition based machine learning methods for annual runoff forecasting
    Chen, Shu
    Ren, Miaomiao
    Sun, Wei
    JOURNAL OF HYDROLOGY, 2021, 603
  • [3] A two-stage housing choice forecasting model
    Tu, Y
    Goldfinch, J
    URBAN STUDIES, 1996, 33 (03) : 517 - 537
  • [4] Two-stage optimization for machine learning workflow
    Quemy, Alexandre
    INFORMATION SYSTEMS, 2020, 92
  • [5] Two-stage extreme learning machine for regression
    Lan, Yuan
    Soh, Yeng Chai
    Huang, Guang-Bin
    NEUROCOMPUTING, 2010, 73 (16-18) : 3028 - 3038
  • [6] A Fast Two-Stage Extreme Learning Machine
    Lai, Jie
    Wang, Xiaodan
    Li, Rui
    Gu, Jinghao
    ICDLT 2019: 2019 3RD INTERNATIONAL CONFERENCE ON DEEP LEARNING TECHNOLOGIES, 2019, : 16 - 22
  • [7] Two-stage quality monitoring of a laser welding process using machine learning
    Dold, Patricia M.
    Bleier, Fabian
    Boley, Meiko
    Mikut, Ralf
    AT-AUTOMATISIERUNGSTECHNIK, 2023, 71 (10) : 878 - 890
  • [8] A novel hybrid model based on two-stage data processing and machine learning for forecasting chlorophyll-a concentration in reservoirs
    Yu, Wenqing
    Wang, Xingju
    Jiang, Xin
    Zhao, Ranhang
    Zhao, Shen
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2024, 31 (01) : 406 - 421
  • [9] A novel hybrid model based on two-stage data processing and machine learning for forecasting chlorophyll-a concentration in reservoirs
    Wenqing Yu
    Xingju Wang
    Xin Jiang
    Ranhang Zhao
    Shen Zhao
    Environmental Science and Pollution Research, 2024, 31 : 262 - 279
  • [10] Two-Stage Machine Learning Framework for Simultaneous Forecasting of Price-Load in the Smart Grid
    Victoire, Aruldoss Albert T.
    Gobu, B.
    Jaikumar, S.
    Arulmozhi, N.
    Kanimozhi, P.
    Victoire, Amalraj T.
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 1081 - 1086