Development of a regional feature selection-based machine learning system (RFSML v1.0) for air pollution forecasting over China

被引:6
|
作者
Fang, Li [1 ]
Jin, Jianbing [1 ]
Segers, Arjo [2 ]
Lin, Hai Xiang [3 ,4 ]
Pang, Mijie [1 ]
Xiao, Cong [5 ]
Deng, Tuo [4 ]
Liao, Hong [1 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Jiangsu Collaborat Innovat Ctr Atmospher Environm, Sch Environm Sci & Engn, Jiangsu Key Lab Atmospher Environm Monitoring & P, Nanjing, Jiangsu, Peoples R China
[2] TNO, Dept Climate Air & Sustainabil, Utrecht, Netherlands
[3] Leiden Univ, Inst Environm Sci, Leiden, Netherlands
[4] Delft Univ Technol, Delft Inst Appl Math, Delft, Netherlands
[5] China Univ Petr, Key Lab Petr Engn, Minist Educ, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
MEMORY NEURAL-NETWORK; YANGTZE-RIVER DELTA; NORTH CHINA; PM2.5; CONCENTRATIONS; EASTERN CHINA; SICHUAN BASIN; PREDICTION; EMISSION; MODEL; TRANSPORT;
D O I
10.5194/gmd-15-7791-2022
中图分类号
P [天文学、地球科学];
学科分类号
07 ;
摘要
With the explosive growth of atmospheric data, machine learning models have achieved great success in air pollution forecasting because of their higher computational efficiency than the traditional chemical transport models. However, in previous studies, new prediction algorithms have only been tested at stations or in a small region; a large-scale air quality forecasting model remains lacking to date. Huge dimensionality also means that redundant input data may lead to increased complexity and therefore the over-fitting of machine learning models. Feature selection is a key topic in machine learning development, but it has not yet been explored in atmosphere-related applications. In this work, a regional feature selection-based machine learning (RFSML) system was developed, which is capable of predicting air quality in the short term with high accuracy at the national scale. Ensemble-Shapley additive global importance analysis is combined with the RFSML system to extract significant regional features and eliminate redundant variables at an affordable computational expense. The significance of the regional features is also explained physically. Compared with a standard machine learning system fed with relative features, the RFSML system driven by the selected key features results in superior interpretability, less training time, and more accurate predictions. This study also provides insights into the difference in interpretability among machine learning models (i.e., random forest, gradient boosting, and multi-layer perceptron models).
引用
收藏
页码:7791 / 7807
页数:17
相关论文
共 10 条
  • [1] MSDM v1.0: A machine learning model for precipitation nowcasting over eastern China using multisource data
    Li, Dawei
    Liu, Yudi
    Chen, Chaohui
    GEOSCIENTIFIC MODEL DEVELOPMENT, 2021, 14 (06) : 4019 - 4034
  • [2] A Regional multi-Air Pollutant Assimilation System (RAPAS v1.0) for emission estimates: system development and application
    Feng, Shuzhuang
    Jiang, Fei
    Wu, Zheng
    Wang, Hengmao
    He, Wei
    Shen, Yang
    Zhang, Lingyu
    Zheng, Yanhua
    Lou, Chenxi
    Jiang, Ziqiang
    Ju, Weimin
    GEOSCIENTIFIC MODEL DEVELOPMENT, 2023, 16 (20) : 5949 - 5977
  • [3] Optimal Feature Selection-Based Dental Caries Prediction Model Using Machine Learning for Decision Support System
    Kang, In-Ae
    Njimbouom, Soualihou Ngnamsie
    Kim, Jeong-Dong
    BIOENGINEERING-BASEL, 2023, 10 (02):
  • [4] Feature Selection-Based Evaluation for Network Intrusion Detection System with Machine Learning Methods on CICIDS2017
    Upadhyay, Lav
    Tripathi, Meenakshi
    Grover, Jyoti
    COMMUNICATION AND INTELLIGENT SYSTEMS, VOL 3, ICCIS 2023, 2024, 969 : 345 - 356
  • [5] Development and application of an automated air quality forecasting system based on machine learning
    Ke, Huabing
    Gong, Sunling
    He, Jianjun
    Zhang, Lei
    Cui, Bin
    Wang, Yaqiang
    Mo, Jingyue
    Zhou, Yike
    Zhang, Huan
    SCIENCE OF THE TOTAL ENVIRONMENT, 2022, 806
  • [6] Hybrid feature selection-based machine learning Classification system for the prediction of injury severity in single and multiple-vehicle accidents
    Zhang, Shuguang
    Khattak, Afaq
    Matara, Caroline Mongina
    Hussain, Arshad
    Farooq, Asim
    PLOS ONE, 2022, 17 (02):
  • [7] A gridded air quality forecast through fusing site-available machine learning predictions from RFSML v1.0 and chemical transport model results from GEOS-Chem v13.1.0 using the ensemble Kalman filter
    Fang, Li
    Jin, Jianbing
    Segers, Arjo
    Liao, Hong
    Li, Ke
    Xu, Bufan
    Han, Wei
    Pang, Mijie
    Lin, Hai Xiang
    GEOSCIENTIFIC MODEL DEVELOPMENT, 2023, 16 (16) : 4867 - 4882
  • [8] Hybrid machine learning system based on multivariate data decomposition and feature selection for improved multitemporal evapotranspiration forecasting
    Lee, Jinwook
    Bateni, Sayed M.
    Jun, Changhyun
    Heggy, Essam
    Jamei, Mehdi
    Kim, Dongkyun
    Ghafouri, Hamid Reza
    Deenik, Jonathan L.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 135
  • [9] Evaluation of Seasonal Prediction of Extreme Wind Resource Potential over China Based on a Dynamic Prediction System SIDRI-ESS V1.0
    Yan, Zixiang
    Li, Jinxiao
    Zhou, Wen
    Lin, Zouxing
    Zang, Yuxin
    Li, Siyuan
    ATMOSPHERE, 2024, 15 (09)
  • [10] An innovative ensemble learning air pollution early-warning system for China based on incremental extreme learning machine
    Du, Zongjuan
    Heng, Jiani
    Niu, Mingfei
    Sun, Shaolong
    ATMOSPHERIC POLLUTION RESEARCH, 2021, 12 (09)