ESTIMATION OF MISSING VALUES IN AIR POLLUTION DATASET BY USING VARIOUS IMPUTATION METHODS

被引:0
|
作者
Sukatis, Fahren Fazzer [1 ]
Noor, Norazian Mohamed [1 ]
Zakaria, Nur Afiqah [1 ]
Ul-Saufie, Ahmad Zia [2 ]
Suwardi, Annas [3 ]
机构
[1] Univ Malaysia Perlis, SERG, Ctr Excellence Geopolymer & Green Technol CEGeoGT, Sch Environm Engn, Kompleks Pusat Pengajian Jejawi 3, Arau 02600, Perlis, Malaysia
[2] Univ Teknol MARA, Fac Comp & Math Sci, Kampus Permatang Pauh,Jalan Permatang Pauh, Permatang Pauh 13500, Pulau Pinang, Malaysia
[3] Univ Negeri Makasar, Fac Math & Nat Sci, Kampus UNM Parangtambung,Jalan Daeng, Tata Makassar, Indonesia
关键词
Air pollution; Estimation; Missing data; Imputation methods; Simulation; Performance indicators; SINGLE;
D O I
暂无
中图分类号
J [艺术];
学科分类号
13 ; 1301 ;
摘要
The aim of this study is to determine the best imputation method to fill in the various gaps of missing values in air pollution dataset. Ten imputation methods such as Series Mean, Linear Interpolation, Mean Nearest Neighbour, Expectation Maximization, Markov Chain Monte Carlo, 12-hours Moving Average, 24-hours Moving Average, and Exponential Smoothing (a = 0.2, 0.5, and 0.8) were applied to fill in the missing values. Annual hourly monitoring data for ambient temperature, wind speed humidity, SO2, NO2, O3, CO, and PM10 from Petaling Jaya and Shah Alam were used from 2012 to 2016. These datasets were simulated into three types of missing data patterns that vary in length gaps of missing patterns, i.e. simple, medium and complex patterns. Each patterns was simulated into two percentages of missing, i.e. 10% and 20%. The performance of these imputation methods was evaluated using four performance indicator: mean absolute error, root mean squared error, prediction accuracy, and index of agreement. Overall, the Expectation Maximization method was selected as the best method of imputation to fill in the simple, medium and complex patterns of simulated missing data, while the Series Mean method was shown as the worst method of imputation.
引用
下载
收藏
页码:791 / 804
页数:14
相关论文
共 50 条
  • [31] Using the outcome for imputation of missing predictor values was preferred
    Moons, Karel G. M.
    Donders, Rogier A. R. T.
    Stijnen, Theo
    Harrell, Frank E., Jr.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2006, 59 (10) : 1092 - 1101
  • [32] A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java']Java
    Susianto, Y.
    Notodiputro, K. A.
    Kurnia, A.
    Wijayanto, H.
    3RD INTERNATIONAL SEMINAR ON SCIENCES SCIENCES ON PRECISION AND SUSTAINABLE AGRICULTURE (ISS-2016), 2017, 58
  • [33] The Impact of Different Fold for Cross Validation of Missing Values Imputation Method on Hepatitis Dataset
    Astuti, Tri
    Nugroho, Hanung Adi
    Adji, Teguh Bharata
    2015 INTERNATIONAL CONFERENCE QUALITY IN RESEARCH (QIR), 2015, : 51 - 55
  • [34] Propensity score estimation with missing values using a multiple imputation missingness pattern (MIMP) approach
    Qu, Yongming
    Lipkovich, Ilya
    STATISTICS IN MEDICINE, 2009, 28 (09) : 1402 - 1414
  • [35] Estimation and imputation in linear regression with missing values in both response and covariate
    Shao, Jun
    STATISTICS AND ITS INTERFACE, 2013, 6 (03) : 361 - 368
  • [36] Variance estimation when donor imputation is used to fill in missing values
    Beaumont, Jean-Francois
    Bocci, Cynthia
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2009, 37 (03): : 400 - 416
  • [37] Comparison of Single and MICE Imputation Methods for Missing Values: A Simulation Study
    Pauzi, Nurul Azifah Mohd
    Wah, Yap Bee
    Deni, Sayang Mohd
    Rahim, Siti Khatijah Nor Abdul
    Suhartono
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2021, 29 (02): : 979 - 998
  • [38] The ability of different imputation methods for missing values in mental measurement questionnaires
    Xu, Xueying
    Xia, Leizhen
    Zhang, Qimeng
    Wu, Shaoning
    Wu, Mingcheng
    Liu, Hongbo
    BMC MEDICAL RESEARCH METHODOLOGY, 2020, 20 (01)
  • [39] The ability of different imputation methods for missing values in mental measurement questionnaires
    Xueying Xu
    Leizhen Xia
    Qimeng Zhang
    Shaoning Wu
    Mingcheng Wu
    Hongbo Liu
    BMC Medical Research Methodology, 20
  • [40] On the choice of the best imputation methods for missing values considering three groups of classification methods
    Julián Luengo
    Salvador García
    Francisco Herrera
    Knowledge and Information Systems, 2012, 32 : 77 - 108