Investigating the impact of input variable selection on daily solar radiation prediction accuracy using data-driven models: a case study in northern Iran

被引:0
|
作者
Mohammad Sina Jahangir
Seyed Mostafa Biazar
David Hah
John Quilty
Mohammad Isazadeh
机构
[1] University of Waterloo,Department of Civil and Environmental Engineering
[2] University of Tabriz,Department of Water Engineering, Faculty of Agriculture
关键词
Data-driven models; Solar radiation prediction; Input variable selection; Edgeworth approximation-based conditional mutual information; Iran;
D O I
暂无
中图分类号
学科分类号
摘要
Data-driven models have been explored in numerous studies for solar radiation (Rs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}_{s}$$\end{document}) prediction. However, the use of different input variable selection (IVS) methods for improving Rs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}_{s}$$\end{document} prediction accuracy has mostly been neglected. This study explores various IVS methods, including Gamma test (GT), Procrustes analysis (PA) and Edgeworth approximation-based conditional mutual information (EA) and evaluates their ability to improve Rs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}_{s}$$\end{document} prediction accuracy by coupling them with popular non-linear data-driven models, multilayer perceptron (MLP), support vector machine, extreme learning machine and multi-gene genetic programming (MGGP). The partial correlation input selection method was coupled with multiple linear regression to serve as a linear benchmark. Meteorological data from eight stations in northern Iran was used for building the Rs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}_{s}$$\end{document} prediction models. The type and number of variables selected at each station was dissimilar and dependent on the IVS method. The models utilizing EA selected fewer variables compared to the GT method and had higher accuracy, while models using PA selected fewer variables than all methods but were not able to adequately predict Rs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}_{s}$$\end{document}. It was also found that predictive performance substantially varied when pairing the IVS methods with different model types. For example, MLP, the model with the best average performance, when coupled with EA instead of PA resulted in a ~ 27% improvement (decrease) in the normalized root mean square error (nRMSE). The results also indicated that MGGP produced the least accurate predictions, where the nRMSE increased by up to 40% compared to MLP when the EA method was used for IVS. Finally, IVS hyper-parameter adjustment (which is routinely overlooked in the literature) profoundly affected the results and is recommended as a very important step to consider when developing data-driven models for solar radiation prediction.
引用
收藏
页码:225 / 249
页数:24
相关论文
共 50 条
  • [1] Investigating the impact of input variable selection on daily solar radiation prediction accuracy using data-driven models: a case study in northern Iran
    Jahangir, Mohammad Sina
    Biazar, Seyed Mostafa
    Hah, David
    Quilty, John
    Isazadeh, Mohammad
    STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2022, 36 (01) : 225 - 249
  • [2] Assessment of input data selection methods for BOD simulation using data-driven models: a case study
    Ahmadi, Azadeh
    Fatemi, Zahra
    Nazari, Sara
    ENVIRONMENTAL MONITORING AND ASSESSMENT, 2018, 190 (04)
  • [3] Assessment of input data selection methods for BOD simulation using data-driven models: a case study
    Azadeh Ahmadi
    Zahra Fatemi
    Sara Nazari
    Environmental Monitoring and Assessment, 2018, 190
  • [4] Water quality prediction using data-driven models case study: Ardabil plain, Iran
    Mahsa Hasanpour Kashani
    Mohammad Reza Nikpour
    Reza Jalali
    Soft Computing, 2023, 27 : 7439 - 7448
  • [5] Water quality prediction using data-driven models case study: Ardabil plain, Iran
    Kashani, Mahsa Hasanpour
    Nikpour, Mohammad Reza
    Jalali, Reza
    SOFT COMPUTING, 2023, 27 (11) : 7439 - 7448
  • [6] An evaluation framework for input variable selection algorithms for environmental data-driven models
    Galelli, Stefano
    Humphrey, Greer B.
    Maier, Holger R.
    Castelletti, Andrea
    Dandy, Graeme C.
    Gibbs, Matthew S.
    ENVIRONMENTAL MODELLING & SOFTWARE, 2014, 62 : 33 - 51
  • [7] An improved input variable selection method of the data-driven model for building heating load prediction
    Ling, Jihong
    Dai, Na
    Xing, Jincheng
    Tong, Hui
    JOURNAL OF BUILDING ENGINEERING, 2021, 44
  • [8] Input variable selection for data-driven models of Coriolis flowmeters for two-phase flow measurement
    Wang, Lijuan
    Yan, Yong
    Wang, Xue
    Wang, Tao
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2017, 28 (03)
  • [9] Variable selection using conditional AIC for linear mixed models with data-driven transformations
    Yeonjoo Lee
    Natalia Rojas-Perilla
    Marina Runge
    Timo Schmid
    Statistics and Computing, 2023, 33
  • [10] Variable selection using conditional AIC for linear mixed models with data-driven transformations
    Lee, Yeonjoo
    Rojas-Perilla, Natalia
    Runge, Marina
    Schmid, Timo
    STATISTICS AND COMPUTING, 2023, 33 (01)