Improving Air Quality Data Reliability through Bi-Directional Univariate Imputation with the Random Forest Algorithm

被引:0
|
作者
Arnaut, Filip [1 ]
Durdevic, Vladimir [2 ]
Kolarski, Aleksandra [1 ]
Sreckovic, Vladimir A. [1 ]
Jevremovic, Sreten [3 ]
机构
[1] Univ Belgrade, Inst Phys Belgrade, Pregrev 118, Belgrade 11000, Serbia
[2] Univ Belgrade, Fac Phys, Cara Dusana 13, Belgrade 11000, Serbia
[3] Sci Soc Isaac Newton, Volgina 7, Belgrade 11160, Serbia
关键词
data imputation; air quality; PM2.5; air pollution; missing observations; machine learning; MISSING VALUES; PROSPECTIVITY; POLLUTION;
D O I
10.3390/su16177629
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Forecasting the future levels of air pollution provides valuable information that holds importance for the general public, vulnerable populations, and policymakers. High-quality data are essential for precise and reliable forecasts and investigations of air pollution. Missing observations arise when the sensors utilized for assessing air quality parameters experience malfunctions, which result in erroneous measurements or gaps in the dataset and hinder the data quality. This research paper presents a novel approach for imputing missing values in air quality data in a univariate approach. The algorithm employs the random forest (RF) algorithm to impute missing observations in a bi-directional (forward and reverse in time) manner for air quality (particulate matter less than 2.5 mu m (PM2.5)) data from the Republic of Serbia. The algorithm was evaluated against simple methods, such as the mean and median imputation methods, for missing observations over durations of 24, 48, and 72 h. The results indicate that our algorithm yielded comparable error rates to the median imputation method for all periods when imputing the PM2.5 data. Ultimately, the algorithm's higher computational complexity proved itself as not justified considering the minimal error decrease it achieved compared with the simpler methods. However, for future improvement, additional research is needed, such as utilizing low-code machine learning libraries and time-series forecasting techniques.
引用
收藏
页数:17
相关论文
共 33 条
  • [1] Missing Data Imputation Through the Use of the Random Forest Algorithm
    Pantanowitz, Adam
    Marwala, Tshilidzi
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, 2009, 61 : 53 - 62
  • [2] Concurrent Imputation and Prediction on EHR data using Bi-Directional GANs Bi-GANs for EHR imputation and prediction
    Gupta, Mehak
    Phan, Thao-Ly T.
    Bunnell, H. Timothy
    Beheshti, Rahmatollah
    [J]. 12TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS (ACM-BCB 2021), 2021,
  • [3] Inference of biological networks using Bi-directional Random Forest Granger causality
    Furqan, Mohammad Shaheryar
    Siyal, Mohammad Yakoob
    [J]. SPRINGERPLUS, 2016, 5
  • [4] Bi-Directional Quality Review in a Large Telestroke Network: Improving Stroke Care Through Mutual Learning
    Majersik, Jennifer J.
    Ekstrom, Erin
    Johnson, Jaleen R.
    Wicks, Heather
    Dart, Chona
    Christensen, Lacey
    Reynolds, Tawnie
    Chung, Lee S.
    Hannon, Peter
    Reddy, Vivek
    Harman, Tyler D.
    [J]. STROKE, 2021, 52
  • [5] Bi-directional Quality Review In A Large Telestroke Network: Improving Stroke Care Through Mutual Learning
    Majersik, Jennifer
    Ekstrom, Erin
    Johnson, Jaleen
    Wicks, Heather
    Dart, Chona
    Christensen, Lacey
    Reynolds, Tawnie
    Chung, Lee
    Hannon, Peter
    Reddy, Vivek
    Harman, Tyler
    [J]. NEUROLOGY, 2021, 96 (15)
  • [6] A bi-directional missing data imputation scheme based on LSTM and transfer learning for building energy data
    Ma, Jun
    Cheng, Jack C. P.
    Jiang, Feifeng
    Chen, Weiwei
    Wang, Mingzhu
    Zhai, Chong
    [J]. ENERGY AND BUILDINGS, 2020, 216
  • [7] A genetic algorithm based dynamic pricing for improving bi-directional interactions with reduced power imbalance
    Huang, Pei
    Xu, Tao
    Sun, Yongjun
    [J]. ENERGY AND BUILDINGS, 2019, 199 : 275 - 286
  • [8] A fast DBSCAN algorithm using a bi-directional HNSW index structure for big data
    Weng, Shaoyuan
    Fan, Zongwen
    Gou, Jin
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (08) : 3471 - 3494
  • [9] Air quality and ventilation fan control based on aerosol measurement in the bi-directional undersea Bomlafjord tunnel
    Indrehus, O
    Aralt, TT
    [J]. JOURNAL OF ENVIRONMENTAL MONITORING, 2005, 7 (04): : 349 - 356
  • [10] Effective Air Quality Prediction Using Reinforced Swarm Optimization and Bi-Directional Gated Recurrent Unit
    Gurumoorthy, Sasikumar
    Kokku, Aruna Kumari
    Falkowski-Gilski, Przemyslaw
    Divakarachari, Parameshachari Bidare
    [J]. SUSTAINABILITY, 2023, 15 (14)