ESTIMATION OF MISSING VALUES IN AIR POLLUTION DATASET BY USING VARIOUS IMPUTATION METHODS

被引:0
|
作者
Sukatis, Fahren Fazzer [1 ]
Noor, Norazian Mohamed [1 ]
Zakaria, Nur Afiqah [1 ]
Ul-Saufie, Ahmad Zia [2 ]
Suwardi, Annas [3 ]
机构
[1] Univ Malaysia Perlis, SERG, Ctr Excellence Geopolymer & Green Technol CEGeoGT, Sch Environm Engn, Kompleks Pusat Pengajian Jejawi 3, Arau 02600, Perlis, Malaysia
[2] Univ Teknol MARA, Fac Comp & Math Sci, Kampus Permatang Pauh,Jalan Permatang Pauh, Permatang Pauh 13500, Pulau Pinang, Malaysia
[3] Univ Negeri Makasar, Fac Math & Nat Sci, Kampus UNM Parangtambung,Jalan Daeng, Tata Makassar, Indonesia
关键词
Air pollution; Estimation; Missing data; Imputation methods; Simulation; Performance indicators; SINGLE;
D O I
暂无
中图分类号
J [艺术];
学科分类号
13 ; 1301 ;
摘要
The aim of this study is to determine the best imputation method to fill in the various gaps of missing values in air pollution dataset. Ten imputation methods such as Series Mean, Linear Interpolation, Mean Nearest Neighbour, Expectation Maximization, Markov Chain Monte Carlo, 12-hours Moving Average, 24-hours Moving Average, and Exponential Smoothing (a = 0.2, 0.5, and 0.8) were applied to fill in the missing values. Annual hourly monitoring data for ambient temperature, wind speed humidity, SO2, NO2, O3, CO, and PM10 from Petaling Jaya and Shah Alam were used from 2012 to 2016. These datasets were simulated into three types of missing data patterns that vary in length gaps of missing patterns, i.e. simple, medium and complex patterns. Each patterns was simulated into two percentages of missing, i.e. 10% and 20%. The performance of these imputation methods was evaluated using four performance indicator: mean absolute error, root mean squared error, prediction accuracy, and index of agreement. Overall, the Expectation Maximization method was selected as the best method of imputation to fill in the simple, medium and complex patterns of simulated missing data, while the Series Mean method was shown as the worst method of imputation.
引用
下载
收藏
页码:791 / 804
页数:14
相关论文
共 50 条
  • [11] Imputation for missing values and corresponding variance estimation
    Sitter, RR
    Rao, JNK
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 1997, 25 (01): : 61 - 73
  • [12] Evaluating Imputation Methods for Missing Data in a MCI Dataset
    Gomez-Valades Batanero, Alba
    Rincon Zamorano, Mariano
    Martinez Tomas, Rafael
    Guerrero Martin, Juan
    ARTIFICIAL INTELLIGENCE IN NEUROSCIENCE: AFFECTIVE ANALYSIS AND HEALTH APPLICATIONS, PT I, 2022, 13258 : 446 - 454
  • [13] Comparison of Missing Data Imputation Methods using the Framingham Heart study dataset
    Psychogyios, Konstantinos
    Ilias, Loukas
    Askounis, Dimitris
    2022 IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS (BHI) JOINTLY ORGANISED WITH THE IEEE-EMBS INTERNATIONAL CONFERENCE ON WEARABLE AND IMPLANTABLE BODY SENSOR NETWORKS (BSN'22), 2022,
  • [14] Imputation of missing values for compositional data using classical and robust methods
    Hron, K.
    Templ, M.
    Filzmoser, P.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (12) : 3095 - 3107
  • [15] Missing values imputation methods of air pollution data: An optimal method applied to sulpher dioxide levels and asthma hospital admissions
    Weerasinghe, S
    EPIDEMIOLOGY, 2002, 13 (04) : S256 - S256
  • [16] Missing Data Imputation in Time Series of Air Pollution
    Junger, Washington
    de Leon, Antonio Ponce
    EPIDEMIOLOGY, 2009, 20 (06) : S87 - S87
  • [17] FINNIM: Iterative Imputation of Missing Values in Dissolved Gas Analysis Dataset
    Sahri, Zahriah
    Yusof, Rubiyah
    Watada, Junzo
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2014, 10 (04) : 2093 - 2102
  • [18] Experimental analysis of methods for imputation of missing values in databases
    Farhangfar, A
    Kurgan, L
    Pedrycz, W
    INTELLIGENT COMPUTING: THEORY AND APPLICATIONS II, 2004, 5421 : 172 - 182
  • [19] Estimation of Missing Rainfall Data Using Spatial Interpolation and Imputation Methods
    Radia, Noor Fadhilah Ahmad
    Zakaria, Roslinazairimah
    Azman, Muhammad Az-Zuhri
    2ND ISM INTERNATIONAL STATISTICAL CONFERENCE 2014 (ISM-II): EMPOWERING THE APPLICATIONS OF STATISTICAL AND MATHEMATICAL SCIENCES, 2015, 1643 : 42 - 48
  • [20] Comparison of Interpolation, Statistical, and Data-Driven Methods for Imputation of Missing Values in a Distributed Soil Moisture Dataset
    Kornelsen, Kurt
    Coulibaly, Paulin
    JOURNAL OF HYDROLOGIC ENGINEERING, 2014, 19 (01) : 26 - 43