A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO2 concentration in France 2005-2022

被引:0
|
作者
Barbalat, Guillaume [1 ,2 ,3 ]
Hough, Ian [4 ]
Dorman, Michael [5 ]
Lepeule, Johanna [1 ]
Kloog, Itai [5 ,6 ]
机构
[1] Univ Grenoble Alpes, Inst Adv Biosci IAB, Team Environm Epidemiol Appl Dev & Resp Hlth, Inserm,CNRS, Grenoble, France
[2] Hop Le Vinatier, Ctr Ressource Rehabil Psychosociale & Remediat Co, Pole Ctr Gauche Rive, UMR 5229,CNRS, Villeurbanne, France
[3] Univ Claude Bernard Lyon 1, Villeurbanne, France
[4] Univ Grenoble Alpes, CNRS, INRAE, IRD,INP G,IGE,UMR 5001, Grenoble, France
[5] Ben Gurion Univ Negev, Dept Environm Geoinformat & Urban Planning Sci, Beer Sheva, Israel
[6] Icahn Sch Med Mt Sinai, Dept Environm Med & Publ Hlth, New York, NY USA
关键词
Nitrogen dioxide; 200 m resolution; Daily predictions; Spatio-temporal modeling; Decision-tree; Spatio-temporal blocking; AIR-POLLUTION; OZONE; RETRIEVAL; PM2.5;
D O I
10.1016/j.envres.2024.119241
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Understanding and managing the health effects of Nitrogen Dioxide (NO2) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO2 concentration across continental France from 2005 to 2022. Innovations of this work include the computation of daily predictions at a 200 m resolution in large urban areas and the use of a spatio-temporal blocking procedure to avoid data leakage and ensure fair performance estimation. Predictions were obtained after three cascading stages of modeling: (1) predicting NO2 total column density from Ozone Monitoring Instrument satellite; (2) predicting daily NO2 concentrations at a 1 km spatial resolution using a large set of potential predictors such as predictions obtained from stage 1, land-cover and road traffic data; and (3) predicting residuals from stage 2 models at a 200 m resolution in large urban areas. The latter two stages used a generalized additive model to ensemble predictions of three decision-tree algorithms (random forest, extreme gradient boosting and categorical boosting). Cross-validated performances of our ensemble models were overall very good, with a ten-fold crossvalidated R2 for the 1 km model of 0.83, and of 0.69 for the 200 m model. All three basis learners participated in the ensemble predictions to various degrees depending on time and space. In sum, our multi-stage approach was able to predict daily NO2 concentrations with a relatively low error. Ensembling the predictions maximizes the chance of obtaining accurate values if one basis learner fails in a specific area or at a particular time, by relying on the other learners. To the best of our knowledge, this is the first study aiming to predict NO2 concentrations in France with such a high spatiotemporal resolution, large spatial extent, and long temporal coverage. Exposure estimates are available to investigate NO2 health effects in epidemiological studies.
引用
收藏
页数:13
相关论文
empty
未找到相关数据