Stream water quality prediction using boosted regression tree and random forest models

被引:0
|
作者
Ali O. Alnahit
Ashok K. Mishra
Abdul A. Khan
机构
[1] Clemson University,Glenn Department of Civil Engineering
[2] King Saud University,Department of Civil Engineering
关键词
Water quality; Machine learning algorithms; Random forests; Boosted regression trees;
D O I
暂无
中图分类号
学科分类号
摘要
Reliable water quality prediction can improve environmental flow monitoring and the sustainability of the stream ecosystem. In this study, we compared two machine learning methods to predict water quality parameters, such as total nitrogen (TN), total phosphorus (TP), and turbidity (TUR), for 97 watersheds located in the Southeast Atlantic region of the USA. The modeling framework incorporates multiple climate and watershed variables (characteristics) that often control the water quality indicators in different landscapes. Three techniques, such as stepwise regression (SR), Least Absolute Shrinkage and Selection Operator (LASSO), and genetic algorithm (GA), are implemented to identify appropriate predictors out of 28 climate and catchment-related variables. The selected predictors were then used to develop the Random Forest (RF) and Boosted regression tree (BRT) models for water quality predictions in selected watersheds. The results highlighted that while both algorithms provided reasonable results (based on statistical metrics), the RF algorithm was easier to train and robust to model overfitting. Partial dependence plots highlighted the complex and nonlinear relationships between the individual predictors and the water quality indicators. The thresholds obtained from partial dependence plots showed that the median values of total nitrogen (TN) and total phosphorus (TP) in streams increase significantly when the percentage of urban and agricultural lands is above 40% and 43% of the watershed area, respectively. Furthermore, when soil hydraulic conductivity increases, the reduction in runoff results in decreased Turbidity levels in streams. Therefore, identifying the key watershed characteristics and their critical thresholds can help watershed managers create appropriate regulations for managing and sustaining healthy stream ecosystems. Besides, the forecasting models can improve water quality predictions in ungauged watersheds.
引用
下载
收藏
页码:2661 / 2680
页数:19
相关论文
共 50 条
  • [11] Classification and Prediction of Breast Cancer using Linear Regression, Decision Tree and Random Forest
    Murugan, S.
    Kumar, B. Muthu
    Amudha, S.
    2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 763 - 766
  • [12] A comparative study of land subsidence susceptibility mapping of Tasuj plane, Iran, using boosted regression tree, random forest and classification and regression tree methods
    Hamid Ebrahimy
    Bakhtiar Feizizadeh
    Saeed Salmani
    Hossein Azadi
    Environmental Earth Sciences, 2020, 79
  • [13] A comparative study of land subsidence susceptibility mapping of Tasuj plane, Iran, using boosted regression tree, random forest and classification and regression tree methods
    Ebrahimy, Hamid
    Feizizadeh, Bakhtiar
    Salmani, Saeed
    Azadi, Hossein
    ENVIRONMENTAL EARTH SCIENCES, 2020, 79 (10)
  • [14] Landslide susceptibility mapping using random forest and boosted tree models in Pyeong-Chang, Korea
    Kim, Jeong-Cheol
    Lee, Sunmin
    Jung, Hyung-Sup
    Lee, Saro
    GEOCARTO INTERNATIONAL, 2018, 33 (09) : 1000 - 1015
  • [15] A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility
    Chen, Wei
    Xie, Xiaoshen
    Wang, Jiale
    Pradhan, Biswajeet
    Hong, Haoyuan
    Bui, Dieu Tien
    Duan, Zhao
    Ma, Jianquan
    CATENA, 2017, 151 : 147 - 160
  • [16] Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia
    Ahmed Mohamed Youssef
    Hamid Reza Pourghasemi
    Zohre Sadat Pourtaghi
    Mohamed M. Al-Katheeri
    Landslides, 2016, 13 : 839 - 856
  • [17] Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia
    Youssef, Ahmed Mohamed
    Pourghasemi, Hamid Reza
    Pourtaghi, Zohre Sadat
    Al-Katheeri, Mohamed M.
    LANDSLIDES, 2016, 13 (05) : 839 - 856
  • [18] Analysis & Estimation of Soil for Crop Prediction using Decision Tree and Random Forest Regression Methods
    Tolani, Manoj
    Bajpai, Ambar
    Balodi, Arun
    Sunny
    Wuttisittikulkij, Lunchakorn
    Kovintavewat, Piya
    2022 37TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2022), 2022, : 752 - 755
  • [19] Water quality prediction and carbon reduction mechanisms in wastewater treatment in Northwest cities using Random Forest Regression model
    Jingjing Sun
    Xin Guan
    Xiaojun Sun
    Xiaojing Cao
    Yepei Tan
    Jiarong Liao
    Scientific Reports, 14 (1)
  • [20] Improving Water Quality Index Prediction Using Regression Learning Models
    Hoque, Jesmeen Mohd Zebaral
    Ab Aziz, Nor Azlina
    Alelyani, Salem
    Mohana, Mohamed
    Hosain, Maruf
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (20)