Comparison of data-driven methods for linking extreme precipitation events to local and large-scale meteorological variables

被引:0
|
作者
Nafsika Antoniadou
Hjalte Jomo Danielsen Sørup
Jonas Wied Pedersen
Ida Bülow Gregersen
Torben Schmith
Karsten Arnbjerg-Nielsen
机构
[1] Technical University of Denmark,Department of Environmental and Resource Engineering, Climate and Monitoring
[2] Danish Meteorological Institute,National Centre for Climate Research
[3] Rambøll Denmark A/S,Department of Climate Adaptation and Green Infrastructure
关键词
Extreme precipitation; Meteorological drivers; Machine learning; Logistic regression; ROC curve;
D O I
暂无
中图分类号
学科分类号
摘要
Extreme precipitation events can lead to severe negative consequences for society, the economy, and the environment. It is therefore crucial to understand when such events occur. In the literature, there are a vast number of methods for analyzing their connection to meteorological drivers. However, there has been recent interest in using machine learning methods instead of classic statistical models. While a few studies in climate research have compared the performance of these two approaches, their conclusions are inconsistent. To determine whether an extreme event occurred locally, we trained models using logistic regression and three commonly used supervised machine learning algorithms tailored for discrete outcomes: random forests, neural networks, and support vector machines. We used five explanatory variables (geopotential height at 500 hPa, convective available potential energy, total column water, sea surface temperature, and air surface temperature) from ERA5, and local data from the Danish Meteorological Institute. During the variable selection process, we found that convective available potential energy has the strongest relationship with extreme events. Our results showed that logistic regression performs similarly to more complex machine learning algorithms regarding discrimination as measured by the area under the receiver operating characteristic curve (ROC AUC) and other performance metrics specialized for unbalanced datasets. Specifically, the ROC AUC for logistic regression was 0.86, while the best-performing machine learning algorithm achieved a ROC AUC of 0.87. This study emphasizes the value of comparing machine learning and classical regression modeling, especially when employing a limited set of well-established explanatory variables.
引用
收藏
页码:4337 / 4357
页数:20
相关论文
共 50 条
  • [31] Natiolectal Variation in Dutch Morphosyntax: A Large-Scale, Data-Driven Perspective
    De Troij, Robbert
    Grondelaers, Stefan
    Speelman, Dirk
    JOURNAL OF GERMANIC LINGUISTICS, 2023, 35 (01) : 1 - 68
  • [32] PGD: A Large-scale Professional Go Dataset for Data-driven Analytics
    Gao, Yifan
    arXiv, 2022,
  • [33] Implementing Large-Scale Data-Driven Quality Improvement in Assisted Living
    Ramly, Edmond
    Parks, Reid
    Fishler, Theresa
    Ford, James H.
    Zimmerman, David
    Nordman-Oliveira, Susan
    JOURNAL OF THE AMERICAN MEDICAL DIRECTORS ASSOCIATION, 2022, 23 (02) : 280 - 287
  • [34] Sparse data-driven wavefront prediction for large-scale adaptive optics
    Cerqueira, Paulo
    Piscaer, Pieter
    Verhaegen, Michel
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2021, 38 (07) : 992 - 1002
  • [35] Domain Decomposition for Data-Driven Reduced Modeling of Large-Scale Systems
    Farcas, Ionut-Gabriel
    Gundevia, Rayomand P.
    Munipalli, Ramakanth
    Willcox, Karen E.
    AIAA JOURNAL, 2024, 62 (11) : 4071 - 4086
  • [36] mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics
    Mirarchi, Antonio
    Giorgino, Toni
    De Fabritiis, Gianni
    SCIENTIFIC DATA, 2024, 11 (01)
  • [37] Data-Driven Crowd Understanding: A Baseline for a Large-Scale Crowd Dataset
    Zhang, Cong
    Kang, Kai
    Li, Hongsheng
    Wang, Xiaogang
    Xie, Rong
    Yang, Xiaokang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (06) : 1048 - 1061
  • [38] Introduction to the special issue on data-driven and large-scale distributed simulations
    Cai, W.
    Aydt, H.
    JOURNAL OF SIMULATION, 2017, 11 (03) : 193 - 193
  • [39] Dynamics of Widespread Extreme Precipitation Events and the Associated Large-Scale Environment Using AMeDAS and JRA-55 Data
    Shibuya, Ryosuke
    Takayabu, Yukari
    Kamahori, Hirotaka
    JOURNAL OF CLIMATE, 2021, 34 (22) : 8955 - 8970
  • [40] An extended comparison study of large scale data-driven prediction methods based on variable selection, latent variables, penalized regression and machine learning
    Rendall, Ricardo
    Pereira, Ana
    Reis, Marco
    26TH EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING (ESCAPE), PT B, 2016, 38B : 1629 - 1634