Clustering Imputation for Air Pollution Data

被引:3
|
作者
Alahamade, Wedad [1 ,3 ]
Lake, Iain [1 ]
Reeves, Claire E. [2 ]
De la Iglesia, Beatriz [1 ]
机构
[1] Univ East Anglia, Norwich Res Pk, Norwich NR4 7TJ, Norfolk, England
[2] Univ East Anglia, Ctr Ocean & Atmospher Sci, Sch Environm Sci, Norwich, Norfolk, England
[3] Taibah Univ, Medina, Saudi Arabia
关键词
Air quality; Uncertainty; Time series clustering; Imputation; MISSING VALUES; HEALTH;
D O I
10.1007/978-3-030-61705-9_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Air pollution is a global problem. The assessment of air pollution concentration data is important for evaluating human exposure and the associated risk to health. Unfortunately, air pollution monitoring stations often have periods of missing data or do not measure all pollutants. In this study, we experiment with different approaches to estimate the whole time series for a missing pollutant at a monitoring station as well as missing values within a time series. The main goal is to reduce the uncertainty in air quality assessment. To develop our approach we combine single and multiple imputation, nearest neighbour geographical distance methods and a clustering algorithm for time series. For each station that measures ozone, we produce various imputations for this pollutant and measure the similarity/error between the imputed and the real values. Our results show that imputation by average based on clustering results combined with multiple imputation for missing values is the most reliable and is associated with lower average error and standard deviation.
引用
收藏
页码:585 / 597
页数:13
相关论文
共 50 条
  • [11] Comparison of Imputation Methods for Missing Values in Air Pollution Data: Case Study on Sydney Air Quality Index
    Wijesekara, W. M. L. K. N.
    Liyanage, Liwan
    ADVANCES IN INFORMATION AND COMMUNICATION, VOL 2, 2020, 1130 : 257 - 269
  • [12] Gaussian mixture clustering and imputation of microarray data
    Ouyang, M
    Welsh, WJ
    Georgopoulos, P
    BIOINFORMATICS, 2004, 20 (06) : 917 - 923
  • [13] COLI: Collaborative clustering missing data imputation
    Wan, Daoming
    Razavi-Far, Roozbeh
    Saif, Mehrdad
    Mozafari, Niloofar
    PATTERN RECOGNITION LETTERS, 2021, 152 : 420 - 427
  • [14] Reproducibility and imputation of air toxics data
    Le, Hien Q.
    Batterman, Stuart A.
    Wahl, Robert L.
    JOURNAL OF ENVIRONMENTAL MONITORING, 2007, 9 (12): : 1358 - 1372
  • [15] Instance driven clustering for the imputation of missing data in KDD
    Ilango, P.
    Vijayakumar, K.
    Babu, M. Rajasekhara
    INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2014, 12 (01) : 69 - 81
  • [16] An Improved Mean Imputation Clustering Algorithm for Incomplete Data
    Shi, Hong
    Wang, Pingxin
    Yang, Xin
    Yu, Hualong
    NEURAL PROCESSING LETTERS, 2022, 54 (05) : 3537 - 3550
  • [17] An Improved Mean Imputation Clustering Algorithm for Incomplete Data
    Hong Shi
    Pingxin Wang
    Xin Yang
    Hualong Yu
    Neural Processing Letters, 2022, 54 : 3537 - 3550
  • [18] CO-CLUSTERING OF MULTIVARIATE FUNCTIONAL DATA FOR THE ANALYSIS OF AIR POLLUTION IN THE SOUTH OF FRANCE
    Bouveyron, Charles
    Jacques, Julien
    Schmutz, Amandine
    Simoes, Fanny
    Bottini, Silvia
    ANNALS OF APPLIED STATISTICS, 2022, 16 (03): : 1400 - 1422
  • [19] Comparison of self organizing maps clustering with supervised classification for air pollution data sets
    Bougoudis, Ilias
    Iliadis, Lazaros
    Spartalis, Stefanos
    IFIP Advances in Information and Communication Technology, 2014, 436 : 424 - 435
  • [20] Mean Imputation Techniques for Filling the Missing Observations in Air Pollution Dataset
    Noor, M. N.
    Yahaya, A. S.
    Ramli, N. A.
    Al Bakri, A. M. Mustafa
    ADVANCED MATERIALS ENGINEERING AND TECHNOLOGY II, 2014, 594-595 : 902 - +