A Comparison of Multiple Imputation Methods for Recovering Missing Data in Hydrological Studies

被引:39
|
作者
Hamzah, Fatimah Bibi [1 ,2 ]
Hamzah, Firdaus Mohd [1 ]
Razali, Siti Fatin Mohd [1 ]
Samad, Hafiza [2 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Engn & Built Environm, Bangi 43600, Selangor, Malaysia
[2] Kolej Univ Poly Tech Mara Kuala Lumpur, Fac Comp & Multimedia, Jalan 6-91, Kuala Lumpur 56100, Malaysia
来源
CIVIL ENGINEERING JOURNAL-TEHRAN | 2021年 / 7卷 / 09期
关键词
Missing Data; Streamflow; Robust Regression; CART; k-NN; MLR; LANGAT RIVER-BASIN; STREAMFLOW DATA; TREND ANALYSIS; TIME-SERIES; REGRESSION; FLOW; RECONSTRUCTION; RAINFALL;
D O I
10.28991/cej-2021-03091747
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Missing data is a common problem in hydrological studies; therefore, data reconstruction is critical, especially when it is crucial to employ all available resources, even incomplete records. Furthermore, missing data could have an impact on statistical analysis results, and the amount of variability in the data would not be fittingly anticipated. As a result, this study compared the performance of three imputation methods in predicting recurrence in streamflow datasets: robust random regression imputation (RRRI), k-nearest neighbours (k-NN), and classification and regression tree (CART). Furthermore, entire historical daily streamflow data from 2012 to 2014 (as training dataset) were utilised to assess and validate the effectiveness of the imputation methods in addressing missing streamflow data. Following that, all three methods coupled with multiple linear regression (MLR), were used to restore streamflow rates in Malaysia's Langat River Basin from 1978 to 2016. The estimation techniques effectiveness was evaluated using metrics inclusive of the Nash-Sutcliffe efficiency coefficient (CE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE). The results confirmed that RRRI coupled with MLR (RRRI-MLR) had the lowest RMSE and MAPE values, outperforming all other techniques tested for filling missing data in daily streamflow datasets. This indicates that the RRRI-MLR is the best method for dealing with missing data in streamflow datasets.
引用
收藏
页码:1608 / 1619
页数:12
相关论文
共 50 条
  • [1] A comparison of multiple imputation methods for missing data in longitudinal studies
    Md Hamidul Huque
    John B. Carlin
    Julie A. Simpson
    Katherine J. Lee
    [J]. BMC Medical Research Methodology, 18
  • [2] A comparison of multiple imputation methods for missing data in longitudinal studies
    Huque, Md Hamidul
    Carlin, John B.
    Simpson, Julie A.
    Lee, Katherine J.
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2018, 18
  • [3] Missing data in longitudinal studies: Comparison of multiple imputation methods in a real clinical setting
    Rosato, Rosalba
    Pagano, Eva
    Testa, Silvia
    Zola, Paolo
    di Cuonzo, Daniela
    [J]. JOURNAL OF EVALUATION IN CLINICAL PRACTICE, 2021, 27 (01) : 34 - 41
  • [4] A comparison of multiple-imputation methods for handling missing data in repeated measurements observational studies
    Kalaycioglu, Oya
    Copas, Andrew
    King, Michael
    Omar, Rumana Z.
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2016, 179 (03) : 683 - 706
  • [5] Imputation of missing longitudinal data: a comparison of methods
    Engels, JM
    Diehr, P
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (10) : 968 - 976
  • [6] Missing traffic data: comparison of imputation methods
    Li, Yuebiao
    Li, Zhiheng
    Li, Li
    [J]. IET INTELLIGENT TRANSPORT SYSTEMS, 2014, 8 (01) : 51 - 57
  • [7] Imputation Methods for Multiple Regression with Missing Heteroscedastic Data
    Asif, Muhammad
    Samart, Klairung
    [J]. THAILAND STATISTICIAN, 2022, 20 (01): : 1 - 15
  • [8] Comparison of missing data imputation methods using weather data
    Nida, Hafiza
    Kashif, Muhammad
    Khan, Muhammad Imran
    Ghamkhar, Madiha
    [J]. PAKISTAN JOURNAL OF AGRICULTURAL SCIENCES, 2023, 60 (02): : 327 - 336
  • [9] A comparison of imputation methods for the consecutive missing temperature data
    Kim, Hee-Kyung
    Kang, In-Kyeong
    Lee, Jae-Won
    Lee, Yung-Seop
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (03) : 549 - 557
  • [10] Application and Comparison of Imputation Methods for Missing Degradation Data
    Fan, Ye
    Sun, Fuqiang
    Jiang, Tongmin
    [J]. ENGINEERING ASSET MANAGEMENT - SYSTEMS, PROFESSIONAL PRACTICES AND CERTIFICATION, 2015, : 1607 - 1614