Clustering Imputation for Air Pollution Data

被引:3
|
作者
Alahamade, Wedad [1 ,3 ]
Lake, Iain [1 ]
Reeves, Claire E. [2 ]
De la Iglesia, Beatriz [1 ]
机构
[1] Univ East Anglia, Norwich Res Pk, Norwich NR4 7TJ, Norfolk, England
[2] Univ East Anglia, Ctr Ocean & Atmospher Sci, Sch Environm Sci, Norwich, Norfolk, England
[3] Taibah Univ, Medina, Saudi Arabia
关键词
Air quality; Uncertainty; Time series clustering; Imputation; MISSING VALUES; HEALTH;
D O I
10.1007/978-3-030-61705-9_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Air pollution is a global problem. The assessment of air pollution concentration data is important for evaluating human exposure and the associated risk to health. Unfortunately, air pollution monitoring stations often have periods of missing data or do not measure all pollutants. In this study, we experiment with different approaches to estimate the whole time series for a missing pollutant at a monitoring station as well as missing values within a time series. The main goal is to reduce the uncertainty in air quality assessment. To develop our approach we combine single and multiple imputation, nearest neighbour geographical distance methods and a clustering algorithm for time series. For each station that measures ozone, we produce various imputations for this pollutant and measure the similarity/error between the imputed and the real values. Our results show that imputation by average based on clustering results combined with multiple imputation for missing values is the most reliable and is associated with lower average error and standard deviation.
引用
收藏
页码:585 / 597
页数:13
相关论文
共 50 条
  • [41] A new iterative fuzzy clustering algorithm for multiple imputation of missing data
    Nikfalazar, Sanaz
    Yeh, Chung-Hsing
    Bedingfield, Susan
    Khorshidi, Hadi A.
    2017 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2017,
  • [42] Impact of missing data imputation methods on gene expression clustering and classification
    de Souto, Marcilio C. P.
    Jaskowiak, Pablo A.
    Costa, Ivan G.
    BMC BIOINFORMATICS, 2015, 16
  • [43] Imputation Strategies for Clustering Mixed-Type Data with Missing Values
    Rabea Aschenbruck
    Gero Szepannek
    Adalbert F. X. Wilhelm
    Journal of Classification, 2023, 40 : 2 - 24
  • [44] Single imputation method of missing values in environmental pollution data sets
    Plaia, A.
    Bondi, A. L.
    ATMOSPHERIC ENVIRONMENT, 2006, 40 (38) : 7316 - 7330
  • [45] Clustering-Based Hybrid Approach for Multivariate Missing Data Imputation
    Dubey, Aditya
    Rasool, Akhtar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (11) : 710 - 714
  • [46] A Missing Data Imputation Approach Using Clustering and Maximum Likelihood Estimation
    Albayrak, Muammer
    Turhan, Kemal
    Kurt, Burcin
    2017 MEDICAL TECHNOLOGIES NATIONAL CONGRESS (TIPTEKNO), 2017,
  • [47] Independent component analysis and clustering for pollution data
    Chattopadhyay, Asis Kumar
    Mondal, Saptarshi
    Biswas, Atanu
    ENVIRONMENTAL AND ECOLOGICAL STATISTICS, 2015, 22 (01) : 33 - 43
  • [48] Independent component analysis and clustering for pollution data
    Asis Kumar Chattopadhyay
    Saptarshi Mondal
    Atanu Biswas
    Environmental and Ecological Statistics, 2015, 22 : 33 - 43
  • [49] Imputation Strategies for Clustering Mixed-Type Data with Missing Values
    Aschenbruck, Rabea
    Szepannek, Gero
    Wilhelm, Adalbert F. X.
    JOURNAL OF CLASSIFICATION, 2023, 40 (01) : 2 - 24
  • [50] Impact of missing data imputation methods on gene expression clustering and classification
    Marcilio CP de Souto
    Pablo A Jaskowiak
    Ivan G Costa
    BMC Bioinformatics, 16