Deep Ensemble Machine Learning Framework for the Estimation of PM2.5 Concentrations

被引:31
|
作者
Yu, Wenhua [1 ]
Li, Shanshan [1 ]
Ye, Tingting [1 ]
Xu, Rongbin [1 ]
Song, Jiangning [2 ]
Guo, Yuming [1 ]
机构
[1] Monash Univ, Sch Publ Hlth & Prevent Med, Climate Air Qual Res Unit, Melbourne, Vic, Australia
[2] Monash Univ, Monash Biomed Discovery Inst, Dept Biochem & Mol Biol, Melbourne, Vic, Australia
基金
英国医学研究理事会; 澳大利亚研究理事会; 澳大利亚国家健康与医学研究理事会;
关键词
LAND-USE REGRESSION; SATELLITE DATA; AIR-POLLUTION; MODEL; PM10; PREDICTION; MORTALITY; EXPOSURE; RISK;
D O I
10.1289/EHP9752
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
BACKGROUND: Accurate estimation of historical PM2.5 (particle matter with an aerodynamic diameter of less than 2.5 mu m) is critical and essential for environmental health risk assessment. OBJECTIVES: The aim of this study was to develop a multiple-level stacked ensemble machine learning framework for improving the estimation of the daily ground-level PM2.5 concentrations. METHODS: An innovative deep ensemble machine learning framework (DEML) was developed to estimate the daily PM2.5 concentrations. The framework has a three-stage structure: At the first stage, four base models [gradient boosting machine (GBM), support vector machine (SVM), random forest (RF), and eXtreme gradient boosting (XGBoost)] were used to generate a new data set of PM2.5 concentrations for training the next-stage learners. At the second stage, three meta-models [RF, XGBoost, and Generalized Linear Model (GLM)] were used to estimate PM2.5 concentrations using a combination of the original data set and the predictions from the first-stage models. At the third stage, a nonnegative least squares (NNLS) algorithm was employed to obtain the optimal weights for PM2.5 estimation. We took the data from 133 monitoring stations in Italy as an example to implement the DEML to predict daily PM(2)(.5 )at each 1 km x 1 km grid cell from 2015 to 2019 across Italy. We evaluated the model performance by performing 10-fold cross-validation (CV) and compared it with five benchmark algorithms [GBM, SVM, RF, XGBoost, and Super Learner (SL)]. RESULTS: The results revealed that the PM2.5 prediction performance of DEML [coefficients of determination (R-2) = 0.87 and root mean square error (RMSE) =5.38 mu g/m(3)] was superior to any benchmark models (with R-2 of 0.51, 0.76, 0.83, 0.70, and 0.83 for GBM, SVM, RF, XGBoost, and SL approach. respectively). DEML displayed reliable performance in capturing the spatiotemporal variations of PM2.5 in Italy. DISCUSSION: The proposed DEML framework achieved an outstanding performance in PM(2)(.5 )estimation, which could be used as a tool for more accurate environmental exposure assessment.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Comment on "Deep Ensemble Machine Learning Framework for the Estimation of PM2.5 Concentrations"
    Stafoggia, Massimo
    Cattani, Giorgio
    Ancona, Carla
    Gasparrini, Antonio
    Ranzi, Andrea
    [J]. ENVIRONMENTAL HEALTH PERSPECTIVES, 2022, 130 (06)
  • [2] Response to "Comment on 'Deep Ensemble Machine Learning Frame work for the Estimation of PM2.5 Concentrations'"
    Yu, Wenhua
    Li, Shanshan
    Ye, Tingting
    Xu, Rongbin
    Song, Jiangning
    Guo, Yuming
    [J]. ENVIRONMENTAL HEALTH PERSPECTIVES, 2022, 130 (06)
  • [3] A Machine Learning-Based Ensemble Framework for Forecasting PM2.5 Concentrations in Puli, Taiwan
    Yin, Peng-Yeng
    Yen, Alex Yaning
    Chao, Shou-En
    Day, Rong-Fuh
    Bhanu, Bir
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (05):
  • [4] An Ensemble Deep Learning Model for Forecasting Hourly PM2.5 Concentrations
    Mohan, Anju S.
    Abraham, Lizy
    [J]. IETE JOURNAL OF RESEARCH, 2023, 69 (10) : 6832 - 6845
  • [5] Machine learning and deep learning modeling and simulation for predicting PM2.5 concentrations
    Peng, Jian
    Han, Haisheng
    Yi, Yong
    Huang, Huimin
    Xie, Le
    [J]. CHEMOSPHERE, 2022, 308
  • [6] Forecasting hourly PM2.5 concentrations based on decomposition-ensemble-reconstruction framework incorporating deep learning algorithms
    Cai P.
    Zhang C.
    Chai J.
    [J]. Data Science and Management, 2023, 6 (01): : 46 - 54
  • [7] Estimation of PM2.5 Concentrations in New York State: Understanding the Influence of Vertical Mixing on Surface PM2.5 Using Machine Learning
    Hung, Wei-Ting
    Lu, Cheng-Hsuan
    Alessandrini, Stefano
    Kumar, Rajesh
    Lin, Chin-An
    [J]. ATMOSPHERE, 2020, 11 (12) : 1 - 21
  • [8] Deep learning PM2.5 concentrations with bidirectional LSTM RNN
    Weitian Tong
    Lixin Li
    Xiaolu Zhou
    Andrew Hamilton
    Kai Zhang
    [J]. Air Quality, Atmosphere & Health, 2019, 12 : 411 - 423
  • [9] Deep learning PM2.5 concentrations with bidirectional LSTM RNN
    Tong, Weitian
    Li, Lixin
    Zhou, Xiaolu
    Hamilton, Andrew
    Zhang, Kai
    [J]. AIR QUALITY ATMOSPHERE AND HEALTH, 2019, 12 (04): : 411 - 423
  • [10] ESTIMATION OF ATMOSPHERIC PM2.5 BASED ON PHOTOS AND DEEP LEARNING
    Tan, Siyu
    Yuan, Qiangqiang
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 7984 - 7987