An evaluation of methods for imputation of missing trace element data in groundwaters

被引:24
|
作者
Dickson, Bruce L.
Giblin, Angela M.
机构
[1] Dickson Res Pty Ltd, Gladesville, NSW 2111, Australia
[2] CSIRO, N Ryde, NSW 2113, Australia
关键词
groundwater; uranium; self-organizing map; expectation maximization; imputation; Murray Basin; evaporation ponds;
D O I
10.1144/1467-7873/07-127
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Groundwater data-sets with pH and major cation-anion chemistry are widely available but data that include trace metals are much rarer. This paper examines two methods of data imputation to predict U concentrations using pH, major cations, anions and F in a data-set where some of the U concentrations are missing. The methods evaluated were self-organizing maps (SOM) and expectation maximization (EN. Evaluations were made using a groundwater data-set of 187 samples from NSW and Victoria, which contained a wide range of U concentrations up to 225 mu g/l. Tests made by setting 25% and 50% of the U concentrations to missing showed that, at 25% missing, SOM gave reasonable estimates, identifying all the samples with higher U. EM did not clearly identify the higher samples. At 50% missing, neither method could accurately identify the higher U concentrations. Thus, imputation using samples with missing data included in the training data-set does not appear to be practical. However, a SOM pre-trained on a data-set with no missing U concentrations may be used to impute U concentrations for samples with 100% missing U data. Training using the original data-set and then imputing concentrations for a second set of 360 samples showed that the samples with higher measured U concentrations could generally be identified, but that other samples were also estimated to be U-rich. This method could substantially reduce the number of samples in a large data-set requiring further investigation. The performance of imputation for U reflects the complex interaction of water chemistry, geology and mineralogy that actually determines the U concentrations. Imputation is a useful method for improving estimates of data statistics. SOM, through its model-free approach, is a useful addition to the numerical analysis toolbox for geochemists.
引用
收藏
页码:173 / 178
页数:6
相关论文
共 50 条
  • [41] Comparison of missing value imputation methods for crop yield data
    Lokupitiya, Ravindra S.
    Lokupitiya, Erandathie
    Paustian, Keith
    [J]. ENVIRONMETRICS, 2006, 17 (04) : 339 - 349
  • [42] A comparison of multiple imputation methods for missing data in longitudinal studies
    Huque, Md Hamidul
    Carlin, John B.
    Simpson, Julie A.
    Lee, Katherine J.
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2018, 18
  • [43] A comparison of multiple imputation methods for missing data in longitudinal studies
    Md Hamidul Huque
    John B. Carlin
    Julie A. Simpson
    Katherine J. Lee
    [J]. BMC Medical Research Methodology, 18
  • [44] Applications of Missing Data Imputation Methods in Wastewater Treatment Plants
    Chaoui, Abdellah
    Rebija, Kaoutar
    Chkaiti, Kaoutar
    Laaouan, Mohammed
    Bourziza, Rqia
    Sebari, Karima
    Elkhoumsi, Wafae
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 461 - 469
  • [45] Efficient Imputation Methods to Handle Missing Data in Sample Surveys
    Singh, G. N.
    Jaiswal, Ashok K.
    [J]. JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2022, 16 (03)
  • [46] Investigation of Reliability Coefficients According to Missing Data Imputation Methods
    Akin Arikan, Cigdem
    Soysal, Sumeyra
    [J]. HACETTEPE UNIVERSITESI EGITIM FAKULTESI DERGISI-HACETTEPE UNIVERSITY JOURNAL OF EDUCATION, 2018, 33 (02): : 316 - 336
  • [47] Comparison of Missing Data Imputation Methods in Time Series Forecasting
    Ahn, Hyun
    Sun, Kyunghee
    Kim, Kwanghoon Pio
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 767 - 779
  • [48] Efficient Imputation Methods to Handle Missing Data in Sample Surveys
    G. N. Singh
    Ashok K. Jaiswal
    [J]. Journal of Statistical Theory and Practice, 2022, 16
  • [49] The missing data problem: Imputation methods for dental nonmetric traits
    Vlemincq-Mendieta, Tatiana
    Chu, Elaine Y.
    Dern, Laresa L.
    Scott, George R.
    [J]. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY, 2023, 180 : 187 - 188
  • [50] A Comparative Study of Missing Value Imputation Methods for Education Data
    Keerin, Phimmarin
    [J]. 29TH INTERNATIONAL CONFERENCE ON COMPUTERS IN EDUCATION (ICCE 2021), VOL II, 2021, : 109 - 117