ESTIMATION OF MISSING VALUES IN AIR POLLUTION DATASET BY USING VARIOUS IMPUTATION METHODS

被引:0
|
作者
Sukatis, Fahren Fazzer [1 ]
Noor, Norazian Mohamed [1 ]
Zakaria, Nur Afiqah [1 ]
Ul-Saufie, Ahmad Zia [2 ]
Suwardi, Annas [3 ]
机构
[1] Univ Malaysia Perlis, SERG, Ctr Excellence Geopolymer & Green Technol CEGeoGT, Sch Environm Engn, Kompleks Pusat Pengajian Jejawi 3, Arau 02600, Perlis, Malaysia
[2] Univ Teknol MARA, Fac Comp & Math Sci, Kampus Permatang Pauh,Jalan Permatang Pauh, Permatang Pauh 13500, Pulau Pinang, Malaysia
[3] Univ Negeri Makasar, Fac Math & Nat Sci, Kampus UNM Parangtambung,Jalan Daeng, Tata Makassar, Indonesia
关键词
Air pollution; Estimation; Missing data; Imputation methods; Simulation; Performance indicators; SINGLE;
D O I
暂无
中图分类号
J [艺术];
学科分类号
13 ; 1301 ;
摘要
The aim of this study is to determine the best imputation method to fill in the various gaps of missing values in air pollution dataset. Ten imputation methods such as Series Mean, Linear Interpolation, Mean Nearest Neighbour, Expectation Maximization, Markov Chain Monte Carlo, 12-hours Moving Average, 24-hours Moving Average, and Exponential Smoothing (a = 0.2, 0.5, and 0.8) were applied to fill in the missing values. Annual hourly monitoring data for ambient temperature, wind speed humidity, SO2, NO2, O3, CO, and PM10 from Petaling Jaya and Shah Alam were used from 2012 to 2016. These datasets were simulated into three types of missing data patterns that vary in length gaps of missing patterns, i.e. simple, medium and complex patterns. Each patterns was simulated into two percentages of missing, i.e. 10% and 20%. The performance of these imputation methods was evaluated using four performance indicator: mean absolute error, root mean squared error, prediction accuracy, and index of agreement. Overall, the Expectation Maximization method was selected as the best method of imputation to fill in the simple, medium and complex patterns of simulated missing data, while the Series Mean method was shown as the worst method of imputation.
引用
下载
收藏
页码:791 / 804
页数:14
相关论文
共 50 条
  • [1] Estimation of missing values in air pollution data using single imputation techniques
    Norazian, Mohamed Noor
    Shukri, Yahaya Ahmad
    Azam, Ramli Nor
    Al Bakri, Abdullah Mohd Mustafa
    SCIENCEASIA, 2008, 34 (03): : 341 - 345
  • [2] A Comparison of Various Imputation Methods for Missing Values in Air Quality Data
    Zainuri, Nuryazmin Ahmat
    Jemain, Abdul Aziz
    Muda, Nora
    SAINS MALAYSIANA, 2015, 44 (03): : 449 - 456
  • [3] Mean Imputation Techniques for Filling the Missing Observations in Air Pollution Dataset
    Noor, M. N.
    Yahaya, A. S.
    Ramli, N. A.
    Al Bakri, A. M. Mustafa
    ADVANCED MATERIALS ENGINEERING AND TECHNOLOGY II, 2014, 594-595 : 902 - +
  • [4] Comparison of Imputation Methods for Missing Values in Air Pollution Data: Case Study on Sydney Air Quality Index
    Wijesekara, W. M. L. K. N.
    Liyanage, Liwan
    ADVANCES IN INFORMATION AND COMMUNICATION, VOL 2, 2020, 1130 : 257 - 269
  • [5] Methods for imputation of missing values in air quality data sets
    Junninen, H
    Niska, H
    Tuppurainen, K
    Ruuskanen, J
    Kolehmainen, M
    ATMOSPHERIC ENVIRONMENT, 2004, 38 (18) : 2895 - 2907
  • [6] IMPUTATION OF MISSING VALUES USING DENSITY-ESTIMATION
    TITTERINGTON, DM
    SEDRANSK, J
    STATISTICS & PROBABILITY LETTERS, 1989, 8 (05) : 411 - 418
  • [7] Performance Analysis of Various Missing Value Imputation Methods on Heart Failure Dataset
    Al Khaldy, Mohammad
    Kambhampati, Chandrasekhar
    PROCEEDINGS OF SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS) 2016, VOL 2, 2018, 16 : 415 - 425
  • [8] Multiview data fusion technique for missing value imputation in multisensory air pollution dataset
    Middya, Asif Iqbal
    Roy, Sarbani
    Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (08) : 3173 - 3191
  • [9] OVERCOMING MISSING VALUES USING IMPUTATION METHODS IN THE CLASSIFICATION OF TUBERCULOSIS
    Rochman, Eka Mala Sari
    Miswanto
    Suprajitno, Herry
    COMMUNICATIONS IN MATHEMATICAL BIOLOGY AND NEUROSCIENCE, 2022,
  • [10] Single and Multiple Imputation Method to Replace Missing Values in Air Pollution Datasets: A Review
    Libasin, Zuraira
    Ul-Saufie, Ahmad Zia
    Ahmat, Hasfazilah
    Shaziayani, Wan Nur
    2ND INTERNATIONAL CONFERENCE ON GREEN ENVIRONMENTAL ENGINEERING AND TECHNOLOGY, 2020, 616