An evaluation of various data pre-processing techniques with machine learning models for water level prediction

被引:0
|
作者
Ervin Shan Khai Tiu
Yuk Feng Huang
Jing Lin Ng
Nouar AlDahoul
Ali Najah Ahmed
Ahmed Elshafie
机构
[1] Universiti Tunku Abdul Rahman,Department of Civil Engineering, Lee Kong Chian Faculty of Engineering and Science
[2] UCSI University,Department of Civil Engineering, Faculty of Engineering, Technology and Built Environment
[3] Multimedia University,Faculty of Engineering
[4] University Tenaga Nasional (UNITEN),Institute of Energy Infrastructure (IEI), Department of Civil Engineering, College of Engineering
[5] University of Malaya,Department of Civil Engineering, Faculty of Engineering
来源
Natural Hazards | 2022年 / 110卷
关键词
Artificial neural network; Bagging; Boosting; River water level prediction; Support vector regression; Variational Mode Decomposition;
D O I
暂无
中图分类号
学科分类号
摘要
Floods are the most frequent type of natural disaster. It destroys wildlife habitat, damages bridges, railways, roads, properties, and puts millions of people at risk. As such, flood detection systems have been developed to monitor the changes of water level and raise an alarm should there be imminent danger. River water level prediction is a significant task in flood mitigation planning and floodplains management. Usually, using raw data of rainfall series directly with machine learning (ML) regression methods, does not result in sufficiently good prediction accuracy. The raw data should be pre-processed using specific techniques to enhance their quality a priori to being applied to the prediction methods. This paper serves to address the stated problem by utilizing various data pre-processing techniques such as the Variational Mode Decomposition (VMD), Bagging, Boosting, Bagging-VMD, and Boosting-VMD to enhance the quality of input data and thus culminating in improved model accuracy. The five proposed pre-processing techniques were applied to the observed daily rainfall series of the Dungun river basin, Malaysia, for the period starting from November to February (Northeast Monsoon) from 1996 to 2016. Two machine learning models, the base models (Ori), that is the artificial neural network (ANN) and the support vector regression (SVR), were used in conjunction with the data pre-processing methods. The comparison between the ML methods with and without data pre-processing was done. It was found that prediction of water levels with the two ML methods of SVR and ANN together with the Boosting-VMD was superior to those results derived with just the base original model (Ori). The advantage of the enhanced models (respectively, founded on SVR and ANN) over the original models (SVR and ANN) is best reflected in the performance statistics. Numerical results in terms of root mean square error (RMSE) of (0.42, 0.20 vs 1.85,1.82), mean absolute percentage error (MAPE) of (4.36, 2.82 vs 18.89, 22.56), mean absolute error (MAE) of (0.28,0.16 vs 1.25, 1.41), and Nash–Sutcliffe efficiency coefficient (NSE) (0.96, 0.99 vs 0.25, 0.27) were obtained for the respective models. Additionally, various data visualization graphs such as hydrographs, residual hydrographs, peak-estimates, and box and whisker plots were illustrated to compare between various data pre-processing techniques. The experimental results showed that both the Boosting and the Boosting-VMD methods showed better performance over the other techniques. The Boosting-ANN model was found to be the better model to predict river water levels with the lowest RMSE (0.19), MAPE (2.72), and MAE (0.15) and the highest NSE (0.99).
引用
收藏
页码:121 / 153
页数:32
相关论文
共 50 条
  • [31] Development of Various Diabetes Prediction Models Using Machine Learning Techniques
    Shin, Juyoung
    Kim, Jaewon
    Lee, Chanjung
    Yoon, Joon Young
    Kim, Seyeon
    Song, Seungjae
    Kim, Hun-Sung
    DIABETES & METABOLISM JOURNAL, 2022, 46 (04) : 650 - 657
  • [32] Impact of sensor data pre-processing strategies and selection of machine learning algorithm on the prediction of metritis events in dairy cattle
    Vidal, Gema
    Sharpnack, James
    Pinedo, Pablo
    Tsai, I. Ching
    Lee, Amanda Renee
    Martinez-Lopez, Beatriz
    PREVENTIVE VETERINARY MEDICINE, 2023, 215
  • [33] Performance Evaluation of Various Pre-Processing Techniques for R-Peak Detection in ECG Signal
    Gupta, Varun
    Mittal, Monika
    Mittal, Vikas
    IETE JOURNAL OF RESEARCH, 2022, 68 (05) : 3267 - 3282
  • [34] A survey on pre-processing and post-processing techniques in data mining
    Tomar, Divya
    Agarwal, Sonali
    International Journal of Database Theory and Application, 2014, 7 (04): : 99 - 128
  • [35] Object Pre-processing using Motion Stabilization and Key Frame Extraction with Machine Learning Techniques
    Archana, Kande
    Prasad, V. Kamakshi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (11) : 148 - 157
  • [36] Machine learning classifiers with pre-processing techniques for rumour detection on social media: an empirical study
    Al-Sarem M.
    Al-Harby M.
    Saeed F.
    Hezzam E.A.
    International Journal of Cloud Computing, 2022, 11 (04) : 330 - 344
  • [37] Analysis of Different Pre-Processing Techniques to the Development of Machine Learning Predictors with Gene Expression Profiles
    Duran, Ian
    Leandro, Roberto
    Guevara-Coto, Jose
    IV JORNADAS COSTARRICENSES DE INVESTIGACION EN COMPUTACION E INFORMATICA (JOCICI 2019), 2019,
  • [38] Histogram-Based Image Pre-processing for Machine Learning
    Sada, Ayumi
    Kinoshita, Yuma
    Shiota, Sayaka
    Kiya, Hitoshi
    2018 IEEE 7TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE 2018), 2018, : 272 - 275
  • [39] PRESISTANT: Learning based assistant for data pre-processing
    Bilalli, Besim
    Abello, Alberto
    Aluja-Banet, Tomas
    Wrembel, Robert
    DATA & KNOWLEDGE ENGINEERING, 2019, 123
  • [40] Pre-processing SARS-CoV-2 Sequence Data for Application of Machine Learning Techniques for Visualization and Clustering of Virus Characteristics
    Kim, Juhyeon
    Ahn, Insung
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 663 - 669