Clustering Imputation for Air Pollution Data

被引:3
|
作者
Alahamade, Wedad [1 ,3 ]
Lake, Iain [1 ]
Reeves, Claire E. [2 ]
De la Iglesia, Beatriz [1 ]
机构
[1] Univ East Anglia, Norwich Res Pk, Norwich NR4 7TJ, Norfolk, England
[2] Univ East Anglia, Ctr Ocean & Atmospher Sci, Sch Environm Sci, Norwich, Norfolk, England
[3] Taibah Univ, Medina, Saudi Arabia
关键词
Air quality; Uncertainty; Time series clustering; Imputation; MISSING VALUES; HEALTH;
D O I
10.1007/978-3-030-61705-9_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Air pollution is a global problem. The assessment of air pollution concentration data is important for evaluating human exposure and the associated risk to health. Unfortunately, air pollution monitoring stations often have periods of missing data or do not measure all pollutants. In this study, we experiment with different approaches to estimate the whole time series for a missing pollutant at a monitoring station as well as missing values within a time series. The main goal is to reduce the uncertainty in air quality assessment. To develop our approach we combine single and multiple imputation, nearest neighbour geographical distance methods and a clustering algorithm for time series. For each station that measures ozone, we produce various imputations for this pollutant and measure the similarity/error between the imputed and the real values. Our results show that imputation by average based on clustering results combined with multiple imputation for missing values is the most reliable and is associated with lower average error and standard deviation.
引用
收藏
页码:585 / 597
页数:13
相关论文
共 50 条
  • [31] Autonomous and deterministic supervised fuzzy clustering with data imputation capabilities
    Faculty of Information Science and Technology, Mutilmedia University, Jalan Ayer Keroh Lama, 75450 Bukit Beruang, Melaka, Malaysia
    不详
    Appl. Soft Comput. J., 1 (1117-1125):
  • [32] Autonomous and deterministic supervised fuzzy clustering with data imputation capabilities
    Ming, Lim Kian
    Kiong, Loo Chu
    Soong, Lim Way
    APPLIED SOFT COMPUTING, 2011, 11 (01) : 1117 - 1125
  • [33] Missing Data On Air Pollution
    Atlas, Mark
    HEALTH AFFAIRS, 2011, 30 (07)
  • [34] Imputation of missing data in time series for air pollutants
    Junger, W. L.
    de Leon, A. Ponce
    ATMOSPHERIC ENVIRONMENT, 2015, 102 : 96 - 104
  • [35] The impact of data imputation on air quality prediction problem
    Hua, Van
    Nguyen, Thu
    Dao, Minh-Son
    Nguyen, Hien D.
    Nguyen, Binh T.
    PLOS ONE, 2024, 19 (09):
  • [36] Spectral methods for imputation of missing air quality data
    Shai Moshenberg
    Uri Lerner
    Barak Fishbain
    Environmental Systems Research, 4 (1)
  • [37] Missing data imputation for paired stream and air temperature sensor data
    Li, Han
    Deng, Xinwei
    Smith, Eric
    ENVIRONMETRICS, 2017, 28 (01)
  • [38] The Air Pollution Study Based on Clustering Analysis in China
    Hong Shuifeng
    Zhao Jianhua
    COMPREHENSIVE EVALUATION OF ECONOMY AND SOCIETY WITH STATISTICAL SCIENCE, 2009, : 937 - +
  • [39] ESTIMATION OF MISSING VALUES IN AIR POLLUTION DATASET BY USING VARIOUS IMPUTATION METHODS
    Sukatis, Fahren Fazzer
    Noor, Norazian Mohamed
    Zakaria, Nur Afiqah
    Ul-Saufie, Ahmad Zia
    Suwardi, Annas
    INTERNATIONAL JOURNAL OF CONSERVATION SCIENCE, 2019, 10 (04) : 791 - 804
  • [40] Data Similarity Analysis on Air Pollution Data
    Karazag, Yunus Emre
    Akturk, Isa
    Turhan, Sultan
    Pinarer, Ozgun
    2019 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU), 2019, : 440 - 445