Improving Air Quality Data Reliability through Bi-Directional Univariate Imputation with the Random Forest Algorithm

被引:0
|
作者
Arnaut, Filip [1 ]
Durdevic, Vladimir [2 ]
Kolarski, Aleksandra [1 ]
Sreckovic, Vladimir A. [1 ]
Jevremovic, Sreten [3 ]
机构
[1] Univ Belgrade, Inst Phys Belgrade, Pregrev 118, Belgrade 11000, Serbia
[2] Univ Belgrade, Fac Phys, Cara Dusana 13, Belgrade 11000, Serbia
[3] Sci Soc Isaac Newton, Volgina 7, Belgrade 11160, Serbia
关键词
data imputation; air quality; PM2.5; air pollution; missing observations; machine learning; MISSING VALUES; PROSPECTIVITY; POLLUTION;
D O I
10.3390/su16177629
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Forecasting the future levels of air pollution provides valuable information that holds importance for the general public, vulnerable populations, and policymakers. High-quality data are essential for precise and reliable forecasts and investigations of air pollution. Missing observations arise when the sensors utilized for assessing air quality parameters experience malfunctions, which result in erroneous measurements or gaps in the dataset and hinder the data quality. This research paper presents a novel approach for imputing missing values in air quality data in a univariate approach. The algorithm employs the random forest (RF) algorithm to impute missing observations in a bi-directional (forward and reverse in time) manner for air quality (particulate matter less than 2.5 mu m (PM2.5)) data from the Republic of Serbia. The algorithm was evaluated against simple methods, such as the mean and median imputation methods, for missing observations over durations of 24, 48, and 72 h. The results indicate that our algorithm yielded comparable error rates to the median imputation method for all periods when imputing the PM2.5 data. Ultimately, the algorithm's higher computational complexity proved itself as not justified considering the minimal error decrease it achieved compared with the simpler methods. However, for future improvement, additional research is needed, such as utilizing low-code machine learning libraries and time-series forecasting techniques.
引用
收藏
页数:17
相关论文
共 34 条
  • [31] 3-D agricultural air quality modeling: Impacts of NH3/H2S gas-phase reactions and bi-directional exchange of NH3
    Wang, Kai
    Zhang, Yang
    [J]. ATMOSPHERIC ENVIRONMENT, 2014, 98 : 554 - 570
  • [32] IMPROVING AIR QUALITY PREDICTIONS THROUGH OPTIMIZATION OF OPTIONAL PHYSICAL PARAMETERIZATION SCHEMES IN WRF-CHEM USING MICRO-GENETIC ALGORITHM
    Yoon, Ji Won
    Lee, Ebony
    Lim, Sujeong
    Lee, Seungyeon
    Park, Seon Ki
    [J]. 19TH ANNUAL MEETING OF THE ASIA OCEANIA GEOSCIENCES SOCIETY, AOGS 2022, 2023, : 50 - 52
  • [33] Continuous mapping of fine particulate matter (PM2.5) air quality in East Asia at daily 6 x 6 km2 resolution by application of a random forest algorithm to 2011-2019 GOCI geostationary satellite data
    Pendergrass, Drew C.
    Zhai, Shixian
    Kim, Jhoon
    Koo, Ja-Ho
    Lee, Seoyoung
    Bae, Minah
    Kim, Soontae
    Liao, Hong
    Jacob, Daniel J.
    [J]. ATMOSPHERIC MEASUREMENT TECHNIQUES, 2022, 15 (04) : 1075 - 1091
  • [34] Recovering 3D Basin Basement Relief Using High-Precision Magnetic Data Through Random Forest Regression Algorithm: A Case Study of Tianzhen-Yanggao Sag in Datong Basin
    Yao, Yuhu
    Zhang, Xinjun
    Wang, Kai
    Ma, Yixin
    Li, Yuanbo
    Li, Jing
    Xv, Hongyang
    [J]. EARTH AND SPACE SCIENCE, 2024, 11 (06)