ESTIMATION OF MISSING VALUES IN AIR POLLUTION DATASET BY USING VARIOUS IMPUTATION METHODS

被引:0
|
作者
Sukatis, Fahren Fazzer [1 ]
Noor, Norazian Mohamed [1 ]
Zakaria, Nur Afiqah [1 ]
Ul-Saufie, Ahmad Zia [2 ]
Suwardi, Annas [3 ]
机构
[1] Univ Malaysia Perlis, SERG, Ctr Excellence Geopolymer & Green Technol CEGeoGT, Sch Environm Engn, Kompleks Pusat Pengajian Jejawi 3, Arau 02600, Perlis, Malaysia
[2] Univ Teknol MARA, Fac Comp & Math Sci, Kampus Permatang Pauh,Jalan Permatang Pauh, Permatang Pauh 13500, Pulau Pinang, Malaysia
[3] Univ Negeri Makasar, Fac Math & Nat Sci, Kampus UNM Parangtambung,Jalan Daeng, Tata Makassar, Indonesia
关键词
Air pollution; Estimation; Missing data; Imputation methods; Simulation; Performance indicators; SINGLE;
D O I
暂无
中图分类号
J [艺术];
学科分类号
13 ; 1301 ;
摘要
The aim of this study is to determine the best imputation method to fill in the various gaps of missing values in air pollution dataset. Ten imputation methods such as Series Mean, Linear Interpolation, Mean Nearest Neighbour, Expectation Maximization, Markov Chain Monte Carlo, 12-hours Moving Average, 24-hours Moving Average, and Exponential Smoothing (a = 0.2, 0.5, and 0.8) were applied to fill in the missing values. Annual hourly monitoring data for ambient temperature, wind speed humidity, SO2, NO2, O3, CO, and PM10 from Petaling Jaya and Shah Alam were used from 2012 to 2016. These datasets were simulated into three types of missing data patterns that vary in length gaps of missing patterns, i.e. simple, medium and complex patterns. Each patterns was simulated into two percentages of missing, i.e. 10% and 20%. The performance of these imputation methods was evaluated using four performance indicator: mean absolute error, root mean squared error, prediction accuracy, and index of agreement. Overall, the Expectation Maximization method was selected as the best method of imputation to fill in the simple, medium and complex patterns of simulated missing data, while the Series Mean method was shown as the worst method of imputation.
引用
下载
收藏
页码:791 / 804
页数:14
相关论文
共 50 条
  • [41] Fuzzy-based missing value imputation technique for air pollution data
    Mustafi, Ayon
    Middya, Asif Iqbal
    Roy, Sarbani
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (02) : 1 - 38
  • [42] Fuzzy-based missing value imputation technique for air pollution data
    Ayon Mustafi
    Asif Iqbal Middya
    Sarbani Roy
    Artificial Intelligence Review, 2023, 56 : 1 - 38
  • [43] A Review On Missing Value Estimation Using Imputation Algorithm
    Armina, Roslan
    Zain, Azlan Mohd
    Ali, Nor Azizah
    Sallehuddin, Roselina
    6TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL MATHEMATICS (ICCSCM 2017), 2017, 892
  • [44] On the choice of the best imputation methods for missing values considering three groups of classification methods
    Luengo, Julian
    Garcia, Salvador
    Herrera, Francisco
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 77 - 108
  • [45] Imputation of Missing Values in Training Data using Variational Autoencoder
    Hong, Xuerui
    Hao, Shuang
    2023 IEEE 39TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS, ICDEW, 2023, : 49 - 54
  • [46] Imputation of Missing Values in the Fundamental Data: Using MICE Framework
    Balasubramaniam Meghanadh
    Lagesh Aravalath
    Bhupesh Joshi
    Raghunathan Sathiamoorthy
    Manish Kumar
    Journal of Quantitative Economics, 2019, 17 : 459 - 475
  • [47] A BOOTSTRAP METHOD FOR USING IMPUTATION TECHNIQUES FOR DATA WITH MISSING VALUES
    BELLO, AL
    BIOMETRICAL JOURNAL, 1994, 36 (04) : 453 - 464
  • [48] Imputation of Missing Values in the Fundamental Data: Using MICE Framework
    Meghanadh, Balasubramaniam
    Aravalath, Lagesh
    Joshi, Bhupesh
    Sathiamoorthy, Raghunathan
    Kumar, Manish
    JOURNAL OF QUANTITATIVE ECONOMICS, 2019, 17 (03) : 459 - 475
  • [49] Conceptual framework as a guide to choose appropriate imputation method for missing values in a clinical structured dataset
    Marziyeh Afkanpour
    Diyana Tehrany Dehkordy
    Mehri Momeni
    Hamed Tabesh
    BMC Medical Research Methodology, 25 (1)
  • [50] Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values
    Schneider, T
    JOURNAL OF CLIMATE, 2001, 14 (05) : 853 - 871