An evaluation of methods for imputation of missing trace element data in groundwaters

被引:24
|
作者
Dickson, Bruce L.
Giblin, Angela M.
机构
[1] Dickson Res Pty Ltd, Gladesville, NSW 2111, Australia
[2] CSIRO, N Ryde, NSW 2113, Australia
关键词
groundwater; uranium; self-organizing map; expectation maximization; imputation; Murray Basin; evaporation ponds;
D O I
10.1144/1467-7873/07-127
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Groundwater data-sets with pH and major cation-anion chemistry are widely available but data that include trace metals are much rarer. This paper examines two methods of data imputation to predict U concentrations using pH, major cations, anions and F in a data-set where some of the U concentrations are missing. The methods evaluated were self-organizing maps (SOM) and expectation maximization (EN. Evaluations were made using a groundwater data-set of 187 samples from NSW and Victoria, which contained a wide range of U concentrations up to 225 mu g/l. Tests made by setting 25% and 50% of the U concentrations to missing showed that, at 25% missing, SOM gave reasonable estimates, identifying all the samples with higher U. EM did not clearly identify the higher samples. At 50% missing, neither method could accurately identify the higher U concentrations. Thus, imputation using samples with missing data included in the training data-set does not appear to be practical. However, a SOM pre-trained on a data-set with no missing U concentrations may be used to impute U concentrations for samples with 100% missing U data. Training using the original data-set and then imputing concentrations for a second set of 360 samples showed that the samples with higher measured U concentrations could generally be identified, but that other samples were also estimated to be U-rich. This method could substantially reduce the number of samples in a large data-set requiring further investigation. The performance of imputation for U reflects the complex interaction of water chemistry, geology and mineralogy that actually determines the U concentrations. Imputation is a useful method for improving estimates of data statistics. SOM, through its model-free approach, is a useful addition to the numerical analysis toolbox for geochemists.
引用
收藏
页码:173 / 178
页数:6
相关论文
共 50 条
  • [21] Imputation Methods for Multiple Regression with Missing Heteroscedastic Data
    Asif, Muhammad
    Samart, Klairung
    [J]. THAILAND STATISTICIAN, 2022, 20 (01): : 1 - 15
  • [22] Some Concerns About Imputation Methods for Missing Data
    Toyomoto, Rie
    Funada, Satoshi
    Furukawa, Toshi A.
    [J]. JAMA PSYCHIATRY, 2022, 79 (03) : 270 - 270
  • [23] Missing data imputation methods and their performance with biodistance analyses
    Kenyhercz, Michael W.
    Passalacqua, Nicholas V.
    [J]. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY, 2015, 156 : 185 - 185
  • [24] Evaluating Imputation Methods for Missing Data in a MCI Dataset
    Gomez-Valades Batanero, Alba
    Rincon Zamorano, Mariano
    Martinez Tomas, Rafael
    Guerrero Martin, Juan
    [J]. ARTIFICIAL INTELLIGENCE IN NEUROSCIENCE: AFFECTIVE ANALYSIS AND HEALTH APPLICATIONS, PT I, 2022, 13258 : 446 - 454
  • [25] Missing Network Data A Comparison of Different Imputation Methods
    Krause, Robert W.
    Huisman, Mark
    Steglich, Christian
    Snijders, Tom A. B.
    [J]. 2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 159 - 163
  • [26] Spectral methods for imputation of missing air quality data
    Shai Moshenberg
    Uri Lerner
    Barak Fishbain
    [J]. Environmental Systems Research, 4 (1)
  • [27] Performance Evaluation of Imputation Methods for Missing Data in Logistic Regression Model: Simulation and Application
    Mohamed, Salah M.
    Abonazel, Mohamed R.
    Ghallab, Mohamed G.
    [J]. THAILAND STATISTICIAN, 2023, 21 (04): : 926 - 942
  • [28] Missing in space: an evaluation of imputation methods for missing data in spatial analysis of risk factors for type II diabetes
    Baker, Jannah
    White, Nicole
    Mengersen, Kerrie
    [J]. INTERNATIONAL JOURNAL OF HEALTH GEOGRAPHICS, 2014, 13
  • [29] Missing in space: an evaluation of imputation methods for missing data in spatial analysis of risk factors for type II diabetes
    Jannah Baker
    Nicole White
    Kerrie Mengersen
    [J]. International Journal of Health Geographics, 13
  • [30] IMPUTATION OF MISSING DATA
    Lunt, M.
    [J]. ANNALS OF THE RHEUMATIC DISEASES, 2014, 73 : 49 - 49