Clustering Imputation for Air Pollution Data

被引:3
|
作者
Alahamade, Wedad [1 ,3 ]
Lake, Iain [1 ]
Reeves, Claire E. [2 ]
De la Iglesia, Beatriz [1 ]
机构
[1] Univ East Anglia, Norwich Res Pk, Norwich NR4 7TJ, Norfolk, England
[2] Univ East Anglia, Ctr Ocean & Atmospher Sci, Sch Environm Sci, Norwich, Norfolk, England
[3] Taibah Univ, Medina, Saudi Arabia
关键词
Air quality; Uncertainty; Time series clustering; Imputation; MISSING VALUES; HEALTH;
D O I
10.1007/978-3-030-61705-9_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Air pollution is a global problem. The assessment of air pollution concentration data is important for evaluating human exposure and the associated risk to health. Unfortunately, air pollution monitoring stations often have periods of missing data or do not measure all pollutants. In this study, we experiment with different approaches to estimate the whole time series for a missing pollutant at a monitoring station as well as missing values within a time series. The main goal is to reduce the uncertainty in air quality assessment. To develop our approach we combine single and multiple imputation, nearest neighbour geographical distance methods and a clustering algorithm for time series. For each station that measures ozone, we produce various imputations for this pollutant and measure the similarity/error between the imputed and the real values. Our results show that imputation by average based on clustering results combined with multiple imputation for missing values is the most reliable and is associated with lower average error and standard deviation.
引用
收藏
页码:585 / 597
页数:13
相关论文
共 50 条
  • [1] Evaluation of multivariate time series clustering for imputation of air pollution data
    Alahamade, Wedad
    Lake, Iain
    Reeves, Claire E.
    De La Iglesia, Beatriz
    GEOSCIENTIFIC INSTRUMENTATION METHODS AND DATA SYSTEMS, 2021, 10 (02) : 265 - 285
  • [2] Missing Data Imputation in Time Series of Air Pollution
    Junger, Washington
    de Leon, Antonio Ponce
    EPIDEMIOLOGY, 2009, 20 (06) : S87 - S87
  • [3] A multi-variate time series clustering approach based on intermediate fusion: A case study in air pollution data imputation
    Alahamade, Wedad
    Lake, Iain
    Reeves, Claire E.
    De la Iglesia, Beatriz
    NEUROCOMPUTING, 2022, 490 : 229 - 245
  • [4] Fuzzy-based missing value imputation technique for air pollution data
    Mustafi, Ayon
    Middya, Asif Iqbal
    Roy, Sarbani
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (02) : 1 - 38
  • [5] Fuzzy-based missing value imputation technique for air pollution data
    Ayon Mustafi
    Asif Iqbal Middya
    Sarbani Roy
    Artificial Intelligence Review, 2023, 56 : 1 - 38
  • [6] Estimation of missing values in air pollution data using single imputation techniques
    Norazian, Mohamed Noor
    Shukri, Yahaya Ahmad
    Azam, Ramli Nor
    Al Bakri, Abdullah Mohd Mustafa
    SCIENCEASIA, 2008, 34 (03): : 341 - 345
  • [7] Evolving Clustering Based Data Imputation
    Gautam, Chandan
    Ravi, Vadlamani
    2014 IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT-2014), 2014, : 1763 - 1769
  • [8] An agglomerative clustering methodology for data imputation
    Yenduri, Sumanth
    Third International Conference on Information Technology: New Generations, Proceedings, 2006, : 34 - 39
  • [9] Cooperative Clustering Missing Data Imputation
    Wan, Daoming
    Razavi-Far, Roozbeh
    Saif, Mehrdad
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 1039 - 1045
  • [10] Multiview data fusion technique for missing value imputation in multisensory air pollution dataset
    Middya, Asif Iqbal
    Roy, Sarbani
    Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (08) : 3173 - 3191