Predicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and UNEEC methods

被引:161
|
作者
Rahmati, Omid [1 ,2 ]
Choubin, Bahram [3 ]
Fathabadi, Abolhasan [4 ]
Coulon, Frederic [5 ]
Soltani, Elinaz [6 ]
Shahabi, Himan [7 ]
Mollaefar, Eisa [8 ]
Tiefenbacher, John [9 ]
Cipullo, Sabrina [5 ]
Bin Ahmad, Baharin [10 ]
Bui, Dieu Tien [11 ]
机构
[1] Ton Duc Thang Univ, Geog Informat Sci Res Grp, Ho Chi Minh City, Vietnam
[2] Ton Duc Thang Univ, Fac Environm & Labour Safety, Ho Chi Minh City, Vietnam
[3] Univ Tehran, Fac Nat Resources, Karaj, Iran
[4] Gonbad Kavous Univ, Dept Range & Watershed Management, Gonbad Kavous, Golestan, Iran
[5] Cranfield Univ, Sch Water Energy & Environm, Cranfield MK43 0AL, Beds, England
[6] Shiraz Univ, Dept Nat Resources & Environm Engn, Coll Agr, Shiraz, Iran
[7] Univ Kurdistan, Fac Nat Resources, Dept Geomorphol, Sanandaj, Iran
[8] Dept Nat Resources & Watershed Management Golesta, Gonbad Kavous, Iran
[9] Texas State Univ, Dept Geog, San Marcos, TX 78666 USA
[10] UTM, Fac Built Environm & Surveying, Johor Baharu 81310, Malaysia
[11] Duy Tan Univ, Inst Res & Dev, Da Nang 550000, Vietnam
关键词
Groundwater pollution; Uncertainty assessment; Nitrate concentration; Machine learning; GIS; NEAREST-NEIGHBOR APPROACH; CENTRAL VALLEY; POTENTIAL ZONES; RISK-ASSESSMENT; PRIVATE WELLS; FOREST; GIS; ATTRIBUTES; LINEAMENTS; INFERENCE;
D O I
10.1016/j.scitotenv.2019.06.320
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Although estimating the uncertainty of models used for modelling nitrate contamination of groundwater is essential in groundwater management, it has been generally ignored. This issue motivates this research to explore the predictive uncertainty of machine-learning (ML) models in this field of study using two different residuals uncertainty methods: (pantile regression (QR) and uncertainty estimation based on local errors and clustering (UNEEC). Prediction-interval coverage probability (PICP), the most important of the statistical measures of uncertainty, was used to evaluate uncertainty. Additionally, three state-of-the-art ML models including support vector machine (SVM), random forest (RF), and k-nearest neighbor (kNN) were selected to spatially model groundwater nitrate concentrations. The models were calibrated with nitrate concentrations from 80 wells (70% of the data) and then validated with nitrate concentrations from 34 wells (30% of the data). Both uncertainty and predictive performance criteria should be considered when comparing and selecting the best highlight that the kNN model is the best model because not only did it have the lowest uncertainty based on the PICP statistic in both the QR (0.94) and the UNEEC (in all clusters, 0.85-0.91) methods, but it also had predictive performance statistics (RMSE - 10.63, R-2- 0.71) that were relatively similar to RP (RMSE - 10.41, R-2- 0.72) and higher than SVM (RMSE - 13.28, R-2- 0.58). Determining the uncertainty of ML models used for spatially modelling groundwater-nitrate pollution enables managers to achieve better risk-based decision making and consequently increases the reliability and credibility of groundwater-nitrate predictions. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:855 / 866
页数:12
相关论文
共 50 条
  • [1] Predicting Ozone Pollution in Urban Areas Using Machine Learning and Quantile Regression Models
    Cueva, Fernando
    Saquicela, Victor
    Sarmiento, Juan
    Cabrera, Fanny
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGIES (TICEC 2021), 2021, 1456 : 281 - 296
  • [2] Estimation of predictive hydrologic uncertainty using the quantile regression and UNEEC methods and their comparison on contrasting catchments
    Dogulu, N.
    Lopez, P. Lopez
    Solomatine, D. P.
    Weerts, A. H.
    Shrestha, D. L.
    [J]. HYDROLOGY AND EARTH SYSTEM SCIENCES, 2015, 19 (07) : 3181 - 3201
  • [3] Predicting pollution incidents through semiparametric quantile regression models
    Roca-Pardinas, J.
    Ordonez, C.
    [J]. STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2019, 33 (03) : 673 - 685
  • [4] Predicting pollution incidents through semiparametric quantile regression models
    J. Roca-Pardiñas
    C. Ordóñez
    [J]. Stochastic Environmental Research and Risk Assessment, 2019, 33 : 673 - 685
  • [5] Air Pollution Modelling by Machine Learning Methods
    Vidnerova, Petra
    Neruda, Roman
    [J]. MODELLING, 2021, 2 (04): : 659 - 674
  • [6] Comparative analysis of regression and machine learning methods for predicting fault proneness models
    Singh, Yogesh
    Kaur, Arvinder
    Malhotra, Ruchika
    [J]. INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2009, 35 (2-4) : 183 - 193
  • [7] Predicting Sales Prices of the Houses Using Regression Methods of Machine Learning
    Viktorovich, Parasich Andrey
    Aleksandrovich, Parasich Viktor
    Leopoldovich, Kaftannikov Igor
    Vasilevna, Parasich Irina
    [J]. PROCEEDINGS OF THE 2018 3RD RUSSIAN-PACIFIC CONFERENCE ON COMPUTER TECHNOLOGY AND APPLICATIONS (RPC), 2018,
  • [8] Predicting nitrate exposure from groundwater wells using machine learning and meteorological conditions
    Etheridge, Randall
    Pascual-Gonzalez, Janire
    Hochard, Jacob
    Peralta, Ariane L.
    Vogel, Thomas J.
    [J]. JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION, 2024, 60 (02): : 639 - 651
  • [9] Predicting trace gas concentrations using quantile regression models
    Mercedes Conde-Amboage
    Wenceslao González-Manteiga
    César Sánchez-Sellero
    [J]. Stochastic Environmental Research and Risk Assessment, 2017, 31 : 1359 - 1370
  • [10] Predicting trace gas concentrations using quantile regression models
    Conde-Amboage, Mercedes
    Gonzalez-Manteiga, Wenceslao
    Sanchez-Sellero, Cesar
    [J]. STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2017, 31 (06) : 1359 - 1370