Using a land use regression model with machine learning to estimate ground level PM2.5

被引:84
|
作者
Wong, Pei-Yi [1 ]
Lee, Hsiao-Yun [2 ]
Chen, Yu-Cheng [3 ]
Zeng, Yu-Ting [4 ]
Chern, Yinq-Rong [4 ]
Chen, Nai-Tzu [5 ]
Lung, Shih-Chun Candice [6 ,7 ,8 ]
Su, Huey-Jen [1 ]
Wu, Chih-Da [3 ,4 ]
机构
[1] Natl Cheng Kung Univ, Dept Environm & Occupat Hlth, Tainan, Taiwan
[2] Natl Taipei Univ Nursing & Hlth Sci, Dept Leisure Ind & Hlth Promot, Taipei, Taiwan
[3] Natl Hlth Res Inst, Natl Inst Environm Hlth Sci, Miaoli, Taiwan
[4] Natl Cheng Kung Univ, Dept Geomat, 1 Univ Rd, Tainan 701, Taiwan
[5] Natl Cheng Kung Univ, Res Ctr Environm Trace Tox Subst, Tainan, Taiwan
[6] Acad Sinica, Res Ctr Environm Changes, Taipei, Taiwan
[7] Natl Taiwan Univ, Dept Atmospher Sci, Taipei, Taiwan
[8] Natl Taiwan Univ, Inst Environm Hlth, Taipei, Taiwan
关键词
PM2.5; Land-use regression; Variable selection; Machine learning; Extreme gradient boosting; AIR-POLLUTION; NO2; EXPOSURE; CHINA; PM10; PARTICLES; IMPACT; AREAS; MASS;
D O I
10.1016/j.envpol.2021.116846
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Ambient fine particulate matter (PM2.5) has been ranked as the sixth leading risk factor globally for death and disability. Modelling methods based on having access to a limited number of monitor stations are required for capturing PM2.5 spatial and temporal continuous variations with a sufficient resolution. This study utilized a land use regression (LUR) model with machine learning to assess the spatial-temporal variability of PM2.5. Daily average PM2.5 data was collected from 73 fixed air quality monitoring stations that belonged to the Taiwan EPA on the main island of Taiwan. Nearly 280,000 observations from 2006 to 2016 were used for the analysis. Several datasets were collected to determine spatial predictor variables, including the EPA environmental resources dataset, a meteorological dataset, a land-use inventory, a landmark dataset, a digital road network map, a digital terrain model, MODIS Normalized Difference Vegetation Index (NDVI) database, and a power plant distribution dataset. First, conventional LUR and Hybrid Kriging-LUR were utilized to identify the important predictor variables. Then, deep neural network, random forest, and XGBoost algorithms were used to fit the prediction model based on the variables selected by the LUR models. Data splitting, 10-fold cross validation, external data verification, and seasonal-based and county-based validation methods were used to verify the robustness of the developed models. The results demonstrated that the proposed conventional LUR and Hybrid Kriging-LUR models captured 58% and 89% of PM2.5 variations, respectively. When XGBoost algorithm was incorporated, the explanatory power of the models increased to 73% and 94%, respectively. The Hybrid Kriging-LUR with XGBoost algorithm outperformed the other integrated methods. This study demonstrates the value of combining Hybrid Kriging-LUR model and an XGBoost algorithm for estimating the spatial-temporal variability of PM2.5 exposures. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Estimate annual and seasonal PM1, PM2.5 and PM10 concentrations using land use regression model
    Miri, Mohammad
    Ghassoun, Yahya
    Dovlatabadi, Afshin
    Ebrahimnejad, Ali
    Loewner, Marc-Oliver
    [J]. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY, 2019, 174 : 137 - 145
  • [2] Applying land use regression model to estimate spatial variation of PM2.5 in Beijing, China
    Jiansheng Wu
    Jiacheng Li
    Jian Peng
    Weifeng Li
    Guang Xu
    Chengcheng Dong
    [J]. Environmental Science and Pollution Research, 2015, 22 : 7045 - 7061
  • [3] Applying land use regression model to estimate spatial variation of PM2.5 in Beijing, China
    Wu, Jiansheng
    Li, Jiacheng
    Peng, Jian
    Li, Weifeng
    Xu, Guang
    Dong, Chengcheng
    [J]. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2015, 22 (09) : 7045 - 7061
  • [4] Estimating ground-level PM2.5 using subset regression model and machine learning algorithms in Asian megacity, Dhaka, Bangladesh
    Islam, Abu Reza Md. Towfiqul
    Al Awadh, Mohammed
    Mallick, Javed
    Pal, Subodh Chandra
    Chakraborty, Rabin
    Fattah, Md. Abdul
    Ghose, Bonosri
    Kakoli, Most. Kulsuma Akther
    Islam, Md. Aminul
    Naqvi, Hasan Raja
    Bilal, Muhammad
    Elbeltagi, Ahmed
    [J]. AIR QUALITY ATMOSPHERE AND HEALTH, 2023, 16 (06): : 1117 - 1139
  • [5] Estimating ground-level PM2.5 using subset regression model and machine learning algorithms in Asian megacity, Dhaka, Bangladesh
    Abu Reza Md. Towfiqul Islam
    Mohammed Al Awadh
    Javed Mallick
    Subodh Chandra Pal
    Rabin Chakraborty
    Md. Abdul Fattah
    Bonosri Ghose
    Most. Kulsuma Akther Kakoli
    Md. Aminul Islam
    Hasan Raja Naqvi
    Muhammad Bilal
    Ahmed Elbeltagi
    [J]. Air Quality, Atmosphere & Health, 2023, 16 : 1117 - 1139
  • [6] A Land Use Regression Model for Predicting PM2.5 in Mexico City
    Texcalac Sangrador, J. L.
    Escamilla Nunez, M. C.
    Barraza Villarreal, A.
    Hernandez Cadena, L.
    Jerrett, M.
    Romieu, I
    [J]. EPIDEMIOLOGY, 2008, 19 (06) : S259 - S259
  • [7] A hybrid satellite and land use regression model of source-specific PM2.5 and PM2.5 constituents
    Rahman, Md Mostafijur
    Thurston, George
    [J]. ENVIRONMENT INTERNATIONAL, 2022, 163
  • [8] Estimating PM2.5 concentration using the machine learning GA-SVM method to improve the land use regression model in Shaanxi, China
    Zhang, Ping
    Ma, Wenjie
    Wen, Feng
    Liu, Lei
    Yang, Lianwei
    Song, Jia
    Wang, Ning
    Liu, Qi
    [J]. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY, 2021, 225
  • [9] Estimating PM2.5 Concentrations Using an Improved Land Use Regression Model in Zhejiang, China
    Zheng, Sheng
    Zhang, Chengjie
    Wu, Xue
    [J]. ATMOSPHERE, 2022, 13 (08)
  • [10] Using MAIAC AOD to verify the PM2.5 spatial patterns of a land use regression model
    Li, Runkui
    Ma, Tianxiao
    Xu, Qun
    Song, Xianfeng
    [J]. ENVIRONMENTAL POLLUTION, 2018, 243 : 501 - 509