The Data Synergy Effects of Time-Series Deep Learning Models in Hydrology

被引:43
|
作者
Fang, Kuai [1 ,2 ]
Kifer, Daniel [3 ]
Lawson, Kathryn [2 ]
Feng, Dapeng [2 ]
Shen, Chaopeng [2 ]
机构
[1] Stanford Univ, Dept Earth Syst Sci, Stanford, CA 94305 USA
[2] Penn State Univ, Dept Civil & Environm Engn, University Pk, PA 16802 USA
[3] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
关键词
HYDRAULIC GEOMETRY RELATIONSHIPS; STREAMFLOW; EVAPOTRANSPIRATION; CLASSIFICATION; PREDICTION; CATCHMENTS; PATTERNS; REGIONS;
D O I
10.1029/2021WR029583
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice to stratify a large domain into multiple regions (or regimes) and study each region separately. Traditional wisdom suggests that models built for each region separately will have higher performance because of homogeneity within each region. However, each stratified model has access to fewer and less diverse data points. Here, through two hydrologic examples (soil moisture and streamflow), we show that conventional wisdom may no longer hold in the era of big data and deep learning (DL). We systematically examined an effect we call data synergy, where the results of the DL models improved when data were pooled together from characteristically different regions. The performance of the DL models benefited from modest diversity in the training data compared to a homogeneous training set, even with similar data quantity. Moreover, allowing heterogeneous training data makes eligible much larger training datasets, which is an inherent advantage of DL. A large, diverse data set is advantageous in terms of representing extreme events and future scenarios, which has strong implications for climate change impact assessment. The results here suggest the research community should place greater emphasis on data sharing. Plain Language Summary Traditionally with statistical methods used in hydrology, we split the domain into relatively homogeneous regimes, for each of which we can create a simple model, that is, a local model. However, in the era of big data machine learning, we show that this is often the opposite of what should be done. With deep learning models, we should compile a large and heterogeneous data set and compare the local model to a model trained with all the data (global model). Including heterogeneous training samples may improve the results compared to the local model. We call this the data synergy effect, and it results from two main factors. First, deep learning models are complex enough to accommodate different training instances, inherently permitting larger training datasets with more extreme events and changing trends. Second, with a heterogeneous training data set, deep learning models may be able to learn both the underlying similarities and factors contributing to differences between regions.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Topological Data Analysis of Time-Series as an Input Embedding for Deep Learning Models
    Byers, Morgan
    Hinkle, Lee B.
    Metsis, Vangelis
    [J]. ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2022, PART II, 2022, 647 : 402 - 413
  • [2] Neural additive time-series models: Explainable deep learning for multivariate time-series prediction
    Jo, Wonkeun
    Kim, Dongil
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 228
  • [3] TSViz: Demystification of Deep Learning Models for Time-Series Analysis
    Siddiqui, Shoaib Ahmed
    Mercier, Dominique
    Munir, Mohsin
    Dengel, Andreas
    Ahmed, Sheraz
    [J]. IEEE ACCESS, 2019, 7 : 67027 - 67040
  • [4] Ensemble Deep Learning Models for Forecasting Cryptocurrency Time-Series
    Livieris, Ioannis E.
    Pintelas, Emmanuel
    Stavroyiannis, Stavros
    Pintelas, Panagiotis
    [J]. ALGORITHMS, 2020, 13 (05)
  • [5] Deep compartment models: A deep learning approach for the reliable prediction of time-series data in pharmacokinetic modeling
    Janssen, Alexander
    Leebeek, Frank W. G.
    Cnossen, Marjon H.
    Mathot, Ron A. A.
    [J]. CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY, 2022, 11 (07): : 934 - 945
  • [6] Contrastive Explanations for a Deep Learning Model on Time-Series Data
    Labaien, Jokin
    Zugasti, Ekhi
    De Carlos, Xabier
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2020), 2020, 12393 : 235 - 244
  • [7] Mining complex time-series data by learning Markovian Models
    Wang, Yi
    Zhou, Lizhu
    Feng, Jianhua
    Wang, Jianyong
    Liu, Zhi-Qiang
    [J]. ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 1136 - 1140
  • [8] Dynamic selection of machine learning models for time-series data
    Hananya, Rotem
    Katz, Gilad
    [J]. INFORMATION SCIENCES, 2024, 665
  • [9] Scaling Deep Learning Models for Large Spatial Time-Series Forecasting
    Abbas, Zainab
    Ivarsson, Jon Reginbald
    Al-Shishtawy, Ahmad
    Vlassov, Vladimir
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1587 - 1594
  • [10] Time-series forecasting with deep learning: a survey
    Lim, Bryan
    Zohren, Stefan
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2021, 379 (2194):