Reconstructing missing data by comparing interpolation techniques: Applications for long-term water quality data

被引:10
|
作者
Larson, Danelle M. [1 ]
Bungula, Wako [2 ]
Lee, Amber [3 ]
Stockdill, Alaina [3 ]
McKean, Casey [3 ]
Miller, Frederick Forrest [3 ]
Davis, Killian [3 ]
Erickson, Richard A. [1 ]
Hlavacek, Enrika [1 ]
机构
[1] US Geol Survey, Upper Midwest Environm Sci Ctr, La Crosse, WI 54603 USA
[2] Univ Wisconsin La Crosse, Dept Math & Stat, La Crosse, WI USA
[3] Univ Wisconsin La Crosse, Res Experience Undergraduates Program, La Crosse, WI USA
来源
LIMNOLOGY AND OCEANOGRAPHY-METHODS | 2023年 / 21卷 / 07期
关键词
MACHINE LEARNING-METHODS; SPATIAL INTERPOLATION; RIVER;
D O I
10.1002/lom3.10556
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Missing data are typical yet must be addressed for proper inferences or expanding datasets to guide our limnological understanding and management of aquatic systems. Interpolation methods (i.e., estimating missing values using known values within the dataset) can alleviate data gaps and common problems. We compared seven popular interpolation methods for predicting substantial missingness in a long-term water quality dataset from the Upper Mississippi River, U.S.A. The dataset included 80,000 sampling sites collected over 30 yr that had substantial missingness for total nitrogen (TN), total phosphorus (TP), and water velocity. For all three interpolated water quality variables, random forests had very high prediction accuracy and outperformed the methods of ordinary kriging, polynomial regressions, regression trees, and inverse distance weighting. TP had a mean absolute error (MAE) of 0.03 mg (L-TP)(-1), TN had a MAE of 0.39 mg (L-TN)(-1), and water velocity had a MAE of 0.10 m s(-1). The random forests' error rates were mapped and showed low spatiotemporal variability across the riverscape, indicating high model performance across many habitat types and large spatial scales. In the current era of "big data," interpolation becomes an imperative step prior to ecological analyses yet remains unfamiliar and underutilized. Our research briefly describes the importance of addressing missingness and provides a roadmap to conduct model intercomparisons of other big datasets. We also share adaptable data analysis scripts, which allows others to readily conduct interpolation comparisons for many limnology applications and contexts.
引用
收藏
页码:435 / 449
页数:15
相关论文
共 50 条
  • [21] Acquisition of long-term creep data and knowledge for new applications
    Yagi, Koichi
    INTERNATIONAL JOURNAL OF PRESSURE VESSELS AND PIPING, 2008, 85 (1-2) : 22 - 29
  • [22] Applications of Minimum Data Set in long-term care research
    Chen, Liang-Yu
    Lin, Ming-Hsien
    Peng, Li-Ning
    Chen, Liang-Kung
    JOURNAL OF CLINICAL GERONTOLOGY & GERIATRICS, 2018, 9 (04): : 118 - 125
  • [23] Observer aging and long-term avian survey data quality
    Farmer, Robert G.
    Leonard, Marty L.
    Flemming, Joanna E. Mills
    Anderson, Sean C.
    ECOLOGY AND EVOLUTION, 2014, 4 (12): : 2563 - 2576
  • [24] QUALITY AND VARIABILITY OF LONG-TERM CLIMATE DATA RELATIVE TO AGRICULTURE
    CARLSON, RE
    ENZ, JW
    BAKER, DG
    AGRICULTURAL AND FOREST METEOROLOGY, 1994, 69 (1-2) : 61 - 74
  • [25] A study on long-term forecasting of water quality data using self-attention with correlation
    Xue, Zhi
    Xu, Xinghan
    Hu, Lei
    Liu, Jianwei
    Yan, Xiaohui
    Han, Min
    JOURNAL OF HYDROLOGY, 2025, 650
  • [26] Trend Analysis Using Long-Term Monitoring Data of Water Quality at Churyeongcheon and Yocheon Basins
    Ha, Don-Woo
    Jung, Kang-Young
    Baek, Jonghun
    Lee, Gi-Soon
    Lee, Youngjea
    Shin, Dong Seok
    Na, Eun Hye
    SUSTAINABILITY, 2022, 14 (15)
  • [27] Assessing the utility of shellfish sanitation monitoring data for long-term estuarine water quality analysis
    Chazal, Natalie
    Carr, Megan
    Haines, Andrew
    Leight, Andrew K.
    Nelson, Natalie G.
    MARINE POLLUTION BULLETIN, 2024, 203
  • [28] Long-term data archiving
    David S. Moore
    Analytical and Bioanalytical Chemistry, 2010, 396 : 189 - 192
  • [29] Long-term data on tisagenlecleucel
    Killock, David
    NATURE REVIEWS CLINICAL ONCOLOGY, 2021, 18 (11) : 676 - 676
  • [30] Long-term data archiving
    Moore, David S.
    ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2010, 396 (01) : 189 - 192