ESTIMATION OF MISSING VALUES IN AIR POLLUTION DATASET BY USING VARIOUS IMPUTATION METHODS

被引:0
|
作者
Sukatis, Fahren Fazzer [1 ]
Noor, Norazian Mohamed [1 ]
Zakaria, Nur Afiqah [1 ]
Ul-Saufie, Ahmad Zia [2 ]
Suwardi, Annas [3 ]
机构
[1] Univ Malaysia Perlis, SERG, Ctr Excellence Geopolymer & Green Technol CEGeoGT, Sch Environm Engn, Kompleks Pusat Pengajian Jejawi 3, Arau 02600, Perlis, Malaysia
[2] Univ Teknol MARA, Fac Comp & Math Sci, Kampus Permatang Pauh,Jalan Permatang Pauh, Permatang Pauh 13500, Pulau Pinang, Malaysia
[3] Univ Negeri Makasar, Fac Math & Nat Sci, Kampus UNM Parangtambung,Jalan Daeng, Tata Makassar, Indonesia
关键词
Air pollution; Estimation; Missing data; Imputation methods; Simulation; Performance indicators; SINGLE;
D O I
暂无
中图分类号
J [艺术];
学科分类号
13 ; 1301 ;
摘要
The aim of this study is to determine the best imputation method to fill in the various gaps of missing values in air pollution dataset. Ten imputation methods such as Series Mean, Linear Interpolation, Mean Nearest Neighbour, Expectation Maximization, Markov Chain Monte Carlo, 12-hours Moving Average, 24-hours Moving Average, and Exponential Smoothing (a = 0.2, 0.5, and 0.8) were applied to fill in the missing values. Annual hourly monitoring data for ambient temperature, wind speed humidity, SO2, NO2, O3, CO, and PM10 from Petaling Jaya and Shah Alam were used from 2012 to 2016. These datasets were simulated into three types of missing data patterns that vary in length gaps of missing patterns, i.e. simple, medium and complex patterns. Each patterns was simulated into two percentages of missing, i.e. 10% and 20%. The performance of these imputation methods was evaluated using four performance indicator: mean absolute error, root mean squared error, prediction accuracy, and index of agreement. Overall, the Expectation Maximization method was selected as the best method of imputation to fill in the simple, medium and complex patterns of simulated missing data, while the Series Mean method was shown as the worst method of imputation.
引用
下载
收藏
页码:791 / 804
页数:14
相关论文
共 50 条
  • [21] Single imputation method of missing values in environmental pollution data sets
    Plaia, A.
    Bondi, A. L.
    ATMOSPHERIC ENVIRONMENT, 2006, 40 (38) : 7316 - 7330
  • [22] Comparative Variance and Multiple Imputation Used for Missing Values in Land Price DataSet
    Zhang, Longqing
    Bai, Liping
    Zhang, Xinwei
    Zhang, Yanghong
    Sun, Feng
    Chen, Changcheng
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 61 (03): : 1175 - 1187
  • [23] Iterative Fuzzy C Means, Fuzzy Silhouette, and Imputation for Missing Values in a Dataset
    Mausor, Farahida Hanim
    Jaafar, Jafreezal
    Taib, Shakirah Mohd
    Razali, Razulaimi
    2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTING (ICOCO), 2021, : 382 - 385
  • [24] Spectral methods for imputation of missing air quality data
    Shai Moshenberg
    Uri Lerner
    Barak Fishbain
    Environmental Systems Research, 4 (1)
  • [25] Imputation methods for quantile estimation under missing at random
    Yang, Shu
    Kim, Jae-Kwang
    Shin, Dong Wan
    STATISTICS AND ITS INTERFACE, 2013, 6 (03) : 369 - 377
  • [26] Microarray Missing Values Imputation Methods: Critical Analysis Review
    Hourani, Mou'ath
    El Emary, Ibrahiem M. M.
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2009, 6 (02) : 165 - 190
  • [27] Optimization methods for the imputation of missing values in Educational Institutions Data
    Aureli, D.
    Bruni, R.
    Daraio, C.
    METHODSX, 2021, 8
  • [28] Advanced methods for missing values imputation based on similarity learning
    Fouad, Khaled M.
    Ismail, Mahmoud M.
    Azar, Ahmad Taher
    Arafa, Mona M.
    PEERJ COMPUTER SCIENCE, 2021, 7
  • [29] The Effects of Methods of Imputation for Missing Values on the Validity and Reliability of Scales
    Cokluk, Omay
    Kayri, Murat
    KURAM VE UYGULAMADA EGITIM BILIMLERI, 2011, 11 (01): : 303 - 309
  • [30] Advanced methods for missing values imputation based on similarity learning
    Fouad K.M.
    Ismail M.M.
    Azar A.T.
    Arafa M.M.
    Ismail, Mahmoud M. (mahmoud.ismael@fci.bu.edu.eg), 1600, PeerJ Inc. (07): : 1 - 38