Improving the performance of data-driven techniques through data pre-processing for modelling daily reservoir inflow

被引:15
|
作者
Jothiprakash, V. [1 ]
Kote, Alka S. [1 ]
机构
[1] Indian Inst Technol, Dept Civil Engn, Bombay 400076, Maharashtra, India
关键词
data pre-processing; time-lagged recurrent network; model tree; linear genetic programming; moving average; transformation; full-year inflow data; seasonal inflow; Pawana Reservoir; India; ARTIFICIAL NEURAL-NETWORK; RAINFALL-RUNOFF MODELS; RIVER FLOW MODELS; TIME-SERIES MODELS; PREDICTION; TREES; STREAMFLOWS; BASIN; ANN;
D O I
10.1080/02626667.2010.546358
中图分类号
TV21 [水资源调查与水利规划];
学科分类号
081501 ;
摘要
The effect of data pre-processing while developing artificial intelligence (AI) -based data-driven techniques, such as artificial neural networks (ANN), model trees (MT) and linear genetic programming (LGP), is studied for Pawana Reservoir in Maharashtra, India. The daily one-step-ahead inflow forecasts are compared with flows generated from a univariate autoregressive integrated moving average (ARIMA) model. For the full-year data series, a large error is found mainly due to the occurrence of zero values, since the reservoir is located in an intermittent river. Hence, all the techniques are evaluated using two data series: 18 years of daily full-year inflow data (from 1 January to 31 December); and 18 years of daily monsoon season inflow data (from 1 June to 31 October) to take into account the intermittent nature of the data. The relevant range of inputs for each category is selected based on autocorrelation and partial autocorrelation analyses of the inflow series. Conventional preprocessing methods, such as transformation and/or normalization of data, do not perform well because of the large variation in magnitudes, as well as the many zero values (65% of the full-year data series). Therefore, the input data are pre-processed into un-weighted moving average (MA) series of 3 days, 5 days and 7 days. The 3-day MA series performs better, maintaining the peak inflow pattern as in the actual data series, while the coarser-scale (5-day and 7-day) MA series reduce the peak inflow pattern, leading to more errors in peak inflow prediction. The results indicate that AI methods are powerful tools for modelling the daily flow time series with appropriate data preprocessing, in spite of the presence of many zero values. The time-lagged recurrent network (TLRN) ANN modelling technique applied in this study maps the inflow forecasting in a better way than the standard multilayer perceptron (MLP) neural networks, especially in the case of the seasonal data series. The MT technique performs equally well for low and medium inflows, but fails to predict the peak inflows. However, LGP outperforms the other AI models, and also the ARIMA model, for all inflow magnitudes. In the LGP model, the daily full-year data series with more zero inflow values performs better than the daily seasonal models.
引用
收藏
页码:168 / 186
页数:19
相关论文
共 50 条
  • [1] Improving the prediction accuracy of river inflow using two data pre-processing techniques coupled with data-driven model
    Nazir, Hafiza Mamona
    Hussain, Ijaz
    Faisal, Muhammad
    Elashkar, Elsayed Elsherbini
    Shoukry, Alaa Mohamd
    [J]. PEERJ, 2019, 7
  • [2] Data Pre-processing Techniques for Publication Performance Analysis
    Zulkepli, Fatin Shahirah
    Ibrahin, Roliana
    Saeed, Faisal
    [J]. RECENT TRENDS IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2018, 5 : 59 - 65
  • [3] The generalizability of pre-processing techniques on the accuracy and fairness of data-driven building models: A case study
    Sun, Ying
    Fung, Benjamin C. M.
    Haghighat, Fariborz
    [J]. ENERGY AND BUILDINGS, 2022, 268
  • [4] Data-driven modelling of ship propulsion and the effect of data pre-processing on the prediction of ship fuel consumption and speed loss
    Karagiannidis, Pavlos
    Themelis, Nikos
    [J]. Ocean Engineering, 2021, 222
  • [5] Data-driven modelling of ship propulsion and the effect of data pre-processing on the prediction of ship fuel consumption and speed loss
    Karagiannidis, Pavlos
    Themelis, Nikos
    [J]. OCEAN ENGINEERING, 2021, 222
  • [6] Selective pre-processing of imbalanced data for improving classification performance
    Stefanowski, Jerzy
    Wilk, Szymon
    [J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2008, 5182 : 283 - 292
  • [7] Identification of Emotions in Text Articles through Data Pre-Processing and Data Mining Techniques
    GeethaRamani, R.
    Kumar, M. Naveen
    Balasubramanian, Lakshmi
    [J]. PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2016, : 611 - 615
  • [8] Improving Pipelining Tools for Pre-processing Data
    Novo-Loures, Maria
    Lage, Yeray
    Pavon, Reyes
    Laza, Rosalia
    Ruano-Ordas, David
    Ramon Mendez, Jose
    [J]. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2022, 7 (04): : 214 - 224
  • [9] Efficient Pre-Processing Techniques for Improving Classifiers Performance
    Nickolas, S.
    Shobha, K.
    [J]. JOURNAL OF WEB ENGINEERING, 2022, 21 (02): : 203 - 228
  • [10] Quantifying the Uncertainties in Data-Driven Models for Reservoir Inflow Prediction
    Zhang, Xiaoli
    Wang, Haixia
    Peng, Anbang
    Wang, Wenchuan
    Li, Baojian
    Huang, Xudong
    [J]. WATER RESOURCES MANAGEMENT, 2020, 34 (04) : 1479 - 1493