Non-parametric, semi-parametric, and machine learning models for river temperature frequency analysis at ungauged basins

被引:8
|
作者
Souaissi, Zina [1 ,2 ]
Ouarda, Taha B. M. J. [1 ]
St-Hilaire, Andre [1 ,3 ]
机构
[1] INRS ETE, Canada Res Chair Stat Hydroclimatol, Inst Natl Rech Sci, Ctr Eau Terre Environm, 490 Couronne, Quebec City, PQ G1K 9A9, Canada
[2] Univ Quebec Montreal, Dept Sci Terre & Atmosphere, Pavillon Pesident Kennedy, Montreal, PQ H2X 3Y7, Canada
[3] Univ New Brunswick, Canadian Rivers Inst, Fredericton, NB, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Thermal regime; Random Forest; Extreme gradient boosting; Multivariate adaptive regression splines; Generalized additive models; Regional frequency analysis; CANONICAL CORRELATION-ANALYSIS; ADAPTIVE REGRESSION SPLINES; SUPPORT VECTOR MACHINE; WATER TEMPERATURE; NEURAL-NETWORKS; STREAM TEMPERATURE; STATISTICAL-MODEL; NEW-BRUNSWICK; PREDICTION; CLIMATE;
D O I
10.1016/j.ecoinf.2023.102107
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
River water temperature is essential in regulating many physical and biochemical processes in river systems. Consequently, it is crucial to develop reliable tools for predicting extreme river temperatures at sites with little or no available data. This study aims to compare two machine learning models, random forest (RF) and extreme gradient boosting (XGBoost), with non-parametric multivariate adaptive regression splines (MARS) and semiparametric generalized additive models (GAMs) for the regional estimation of maximum water temperatures at ungauged locations. Three linear and non-linear approaches are also considered in the homogeneous regions delineation step of regional frequency analysis: canonical correlation analysis (CCA), neural network-based canonical correlation analysis (NLCCA), as well as considering all stations (ALL). The results indicate that GAM and MARS lead to the best performances. The performance of NLCCA+GAM is the best in terms of absolute and relative mean square error, followed by CCA + MARS. A significant improvement in the performance of adopted models is achieved by using neighborhood methods. The two machine learning models are tested using two variable selection methods: Recursive Feature Elimination (RFE) and Least Absolute Shrinkage and Selection Operator (LASSO). The results, however, do not show any significant differences. These results may be indicative of the flexibility and ability of the GAM and MARS approaches to reproduce thermal extremes, especially under real-world conditions when a limited amount of data is available.
引用
收藏
页数:15
相关论文
共 50 条