An evaluation of various data pre-processing techniques with machine learning models for water level prediction

被引:0
|
作者
Ervin Shan Khai Tiu
Yuk Feng Huang
Jing Lin Ng
Nouar AlDahoul
Ali Najah Ahmed
Ahmed Elshafie
机构
[1] Universiti Tunku Abdul Rahman,Department of Civil Engineering, Lee Kong Chian Faculty of Engineering and Science
[2] UCSI University,Department of Civil Engineering, Faculty of Engineering, Technology and Built Environment
[3] Multimedia University,Faculty of Engineering
[4] University Tenaga Nasional (UNITEN),Institute of Energy Infrastructure (IEI), Department of Civil Engineering, College of Engineering
[5] University of Malaya,Department of Civil Engineering, Faculty of Engineering
来源
Natural Hazards | 2022年 / 110卷
关键词
Artificial neural network; Bagging; Boosting; River water level prediction; Support vector regression; Variational Mode Decomposition;
D O I
暂无
中图分类号
学科分类号
摘要
Floods are the most frequent type of natural disaster. It destroys wildlife habitat, damages bridges, railways, roads, properties, and puts millions of people at risk. As such, flood detection systems have been developed to monitor the changes of water level and raise an alarm should there be imminent danger. River water level prediction is a significant task in flood mitigation planning and floodplains management. Usually, using raw data of rainfall series directly with machine learning (ML) regression methods, does not result in sufficiently good prediction accuracy. The raw data should be pre-processed using specific techniques to enhance their quality a priori to being applied to the prediction methods. This paper serves to address the stated problem by utilizing various data pre-processing techniques such as the Variational Mode Decomposition (VMD), Bagging, Boosting, Bagging-VMD, and Boosting-VMD to enhance the quality of input data and thus culminating in improved model accuracy. The five proposed pre-processing techniques were applied to the observed daily rainfall series of the Dungun river basin, Malaysia, for the period starting from November to February (Northeast Monsoon) from 1996 to 2016. Two machine learning models, the base models (Ori), that is the artificial neural network (ANN) and the support vector regression (SVR), were used in conjunction with the data pre-processing methods. The comparison between the ML methods with and without data pre-processing was done. It was found that prediction of water levels with the two ML methods of SVR and ANN together with the Boosting-VMD was superior to those results derived with just the base original model (Ori). The advantage of the enhanced models (respectively, founded on SVR and ANN) over the original models (SVR and ANN) is best reflected in the performance statistics. Numerical results in terms of root mean square error (RMSE) of (0.42, 0.20 vs 1.85,1.82), mean absolute percentage error (MAPE) of (4.36, 2.82 vs 18.89, 22.56), mean absolute error (MAE) of (0.28,0.16 vs 1.25, 1.41), and Nash–Sutcliffe efficiency coefficient (NSE) (0.96, 0.99 vs 0.25, 0.27) were obtained for the respective models. Additionally, various data visualization graphs such as hydrographs, residual hydrographs, peak-estimates, and box and whisker plots were illustrated to compare between various data pre-processing techniques. The experimental results showed that both the Boosting and the Boosting-VMD methods showed better performance over the other techniques. The Boosting-ANN model was found to be the better model to predict river water levels with the lowest RMSE (0.19), MAPE (2.72), and MAE (0.15) and the highest NSE (0.99).
引用
收藏
页码:121 / 153
页数:32
相关论文
共 50 条
  • [41] Beyond the Model: Data Pre-processing Attack to Deep Learning Models in Android Apps
    Sang, Ye
    Huang, Yujin
    Huang, Shuo
    Cui, Helei
    PROCEEDINGS OF THE INAUGURAL ASIACCS 2023 WORKSHOP ON SECURE AND TRUSTWORTHY DEEP LEARNING SYSTEMS, SECTL, 2022,
  • [42] Pre-processing and Indexing Techniques for Constellation Queries in Big Data
    Khatibi, Amir
    Porto, Fabio
    Rittmeyer, Joao Guilherme
    Ogasawara, Eduardo
    Valduriez, Patrick
    Shasha, Dennis
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 164 - 172
  • [43] Data pre-processing and artificial neural networks for tidal level prediction at the Pearl River Estuary
    Liang, Bing-Xian
    Hu, Jin-Peng
    Liu, Cheng
    Hong, Bo
    JOURNAL OF HYDROINFORMATICS, 2021, 23 (02) : 368 - 382
  • [44] The importance of signal pre-processing for machine learning: The influence of Data scaling in a driver identity classification
    Abdennour, Najmeddine
    Ouni, Tarek
    Ben Amor, Nader
    2021 IEEE/ACS 18TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2021,
  • [45] Correcting replicate variation in spectroscopic data by machine learning and model-based pre-processing
    Tafintseva, Valeria
    Shapaval, Volha
    Blazhko, Uladzislau
    Kohler, Achim
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2021, 215
  • [46] Application of various Pre-processing techniques on Infrared (IR) Spectroscopy data for classification of different ghee samples
    Kumar, Navjot
    Panchariya, P. C.
    Patel, Surendra Singh
    Kiranmayee, A. H.
    Ranjan, Rishi
    2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [47] Level-set based pre-processing techniques for particle methods
    Yu, Yongchuan
    Zhu, Yujie
    Zhang, Chi
    Haidn, Oskar J.
    Hu, Xiangyu
    COMPUTER PHYSICS COMMUNICATIONS, 2023, 289
  • [48] Deep Learning Framework With Essential Pre-Processing Techniques for Improving Mixed-Gas Concentration Prediction
    Eo, Moonjung
    Han, Jeongyun
    Rhee, Wonjong
    IEEE ACCESS, 2023, 11 : 25467 - 25479
  • [49] An impact analysis of pre-processing techniques in spectroscopy data to classify insect-damaged in soybean plants with machine and deep learning methods
    Osco, Lucas Prado
    Furuya, Danielle Elis Garcia
    Furuya, Michelle Tafs Garcia
    Correa, Daniel Veras
    Goncalvez, Wesley Nunes
    Junior, Jose Marcato
    Borges, Miguel
    Blassioli-Moraes, Maria Carolina
    Michereff, Mirian Fernandes Furtado
    Aquino, Michely Ferreira Santos
    Laumann, Raul Alberto
    Lisenberg, Veraldo
    Ramos, Ana Paula Marques
    Jorge, Lucio Andre de Castro
    INFRARED PHYSICS & TECHNOLOGY, 2022, 123
  • [50] Pre-Processing Flow for Enhancing Learning from Medical Data
    Muresan, Sebastian
    Faloba, Ioana
    Lemnaru, Camelia
    Potolea, Rodica
    2015 IEEE 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2015, : 27 - 34