Machine Learning Approach-based Big Data Imputation Methods for Outdoor Air Quality forecasting

被引:1
|
作者
Narasimhan, D. [1 ]
Vanitha, M. [2 ]
机构
[1] SASTRA Deemed Univ, Dept Math, Kumbakonam 612001, Tamil Nadu, India
[2] SASTRA Deemed Univ, Srinivasa Ramanujan Ctr, Dept Comp Sci & Engn, Kumbakonam 612001, Tamil Nadu, India
来源
关键词
Air quality; Big data analytics; Classification; Ensemble; Multiple imputation;
D O I
10.56042/jsir.v82i03.71764
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Missing data from ambient air databases is a typical issue, but it is much worse in small towns or cities. Missing data is a significant concern for environmental epidemiology. These settings have high pollution exposure levels worldwide, and dataset gaps obstruct health investigations that could later affect local and international policies. When a substantial number of observations contain missing values, the standard errors increase due to the smaller sample size, which may significantly affect the final result. Generally, the performance of various missing value imputation algorithms is proportional to the size of the database and the percentage of missing values within it. This paper proposes and demonstrates an ensemble - imputation - classification framework approach to rebuild air quality information using a dataset from Beijing, China, to forecast air quality. Various single and multiple imputation procedures are utilized to fill the missing records. Then ensemble of diverse classifiers is used on the imputed data to find the air pollution level. The recommended model aims to reduce the error rate and improve accuracy. Extensive testing of datasets with actual missing values has revealed that the suggested methodology significantly enhances the air quality forecasting model's accuracy with multiple imputation and ensemble techniques when compared to other conventional single imputation techniques.
引用
收藏
页码:338 / 347
页数:10
相关论文
共 50 条
  • [21] Analysis of Machine Learning Based Imputation of Missing Data
    Rizvi, Syed Tahir Hussain
    Latif, Muhammad Yasir
    Amin, Muhammad Saad
    Telmoudi, Achraf Jabeur
    Shah, Nasir Ali
    CYBERNETICS AND SYSTEMS, 2023,
  • [22] Integrated Machine Learning and Enhanced Statistical Approach-Based Wind Power Forecasting in Australian Tasmania Wind Farm
    Yao, Fang
    Liu, Wei
    Zhao, Xingyong
    Song, Li
    COMPLEXITY, 2020, 2020 (2020)
  • [23] Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach
    Rodriguez, Rafael
    Pastorini, Marcos
    Etcheverry, Lorena
    Chreties, Christian
    Fossati, Monica
    Castro, Alberto
    Gorgoglione, Angela
    SUSTAINABILITY, 2021, 13 (11)
  • [24] A novel seasonal index–based machine learning approach for air pollution forecasting
    Adeel Khan
    Sumit Sharma
    Kaushik Roy Chowdhury
    Prateek Sharma
    Environmental Monitoring and Assessment, 2022, 194
  • [25] A probabilistic forecasting approach for air quality spatio-temporal data based on kernel learning method
    Zhan, Haolin
    Zhu, Xin
    Hu, Jianming
    APPLIED SOFT COMPUTING, 2023, 132
  • [26] A scalable approach based on deep learning for big data time series forecasting
    Torres, J. F.
    Galicia, A.
    Troncoso, A.
    Martinez-Alvarez, F.
    INTEGRATED COMPUTER-AIDED ENGINEERING, 2018, 25 (04) : 335 - 348
  • [27] Forecasting Medical Device Demand with Online Search Queries: A Big Data and Machine Learning Approach
    Xu, Shuojiang
    Chan, Hing Kai
    25TH INTERNATIONAL CONFERENCE ON PRODUCTION RESEARCH MANUFACTURING INNOVATION: CYBER PHYSICAL MANUFACTURING, 2019, 39 : 32 - 39
  • [28] Machine Learning Methods for Air Quality Monitoring
    Zaytar, Mohamed Akram
    El Amrani, Chaker
    3RD INTERNATIONAL CONFERENCE ON NETWORKING, INFORMATION SYSTEM & SECURITY (NISS'20), 2020,
  • [29] Assessing machine learning and data imputation approaches to handle the issue of data sparsity in sports forecasting
    Fabian Wunderlich
    Henrik Biermann
    Weiran Yang
    Manuel Bassek
    Dominik Raabe
    Nico Elbert
    Daniel Memmert
    Marc Garnica Caparrós
    Machine Learning, 2025, 114 (2)
  • [30] A Comparison of Various Imputation Methods for Missing Values in Air Quality Data
    Zainuri, Nuryazmin Ahmat
    Jemain, Abdul Aziz
    Muda, Nora
    SAINS MALAYSIANA, 2015, 44 (03): : 449 - 456