Reconstructing missing data by comparing interpolation techniques: Applications for long-term water quality data

被引:10
|
作者
Larson, Danelle M. [1 ]
Bungula, Wako [2 ]
Lee, Amber [3 ]
Stockdill, Alaina [3 ]
McKean, Casey [3 ]
Miller, Frederick Forrest [3 ]
Davis, Killian [3 ]
Erickson, Richard A. [1 ]
Hlavacek, Enrika [1 ]
机构
[1] US Geol Survey, Upper Midwest Environm Sci Ctr, La Crosse, WI 54603 USA
[2] Univ Wisconsin La Crosse, Dept Math & Stat, La Crosse, WI USA
[3] Univ Wisconsin La Crosse, Res Experience Undergraduates Program, La Crosse, WI USA
来源
LIMNOLOGY AND OCEANOGRAPHY-METHODS | 2023年 / 21卷 / 07期
关键词
MACHINE LEARNING-METHODS; SPATIAL INTERPOLATION; RIVER;
D O I
10.1002/lom3.10556
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Missing data are typical yet must be addressed for proper inferences or expanding datasets to guide our limnological understanding and management of aquatic systems. Interpolation methods (i.e., estimating missing values using known values within the dataset) can alleviate data gaps and common problems. We compared seven popular interpolation methods for predicting substantial missingness in a long-term water quality dataset from the Upper Mississippi River, U.S.A. The dataset included 80,000 sampling sites collected over 30 yr that had substantial missingness for total nitrogen (TN), total phosphorus (TP), and water velocity. For all three interpolated water quality variables, random forests had very high prediction accuracy and outperformed the methods of ordinary kriging, polynomial regressions, regression trees, and inverse distance weighting. TP had a mean absolute error (MAE) of 0.03 mg (L-TP)(-1), TN had a MAE of 0.39 mg (L-TN)(-1), and water velocity had a MAE of 0.10 m s(-1). The random forests' error rates were mapped and showed low spatiotemporal variability across the riverscape, indicating high model performance across many habitat types and large spatial scales. In the current era of "big data," interpolation becomes an imperative step prior to ecological analyses yet remains unfamiliar and underutilized. Our research briefly describes the importance of addressing missingness and provides a roadmap to conduct model intercomparisons of other big datasets. We also share adaptable data analysis scripts, which allows others to readily conduct interpolation comparisons for many limnology applications and contexts.
引用
收藏
页码:435 / 449
页数:15
相关论文
共 50 条
  • [41] Long-Term (1979-Present) Total Water Storage Anomalies Over the Global Land Derived by Reconstructing GRACE Data
    Li, Fupeng
    Kusche, Juergen
    Chao, Nengfang
    Wang, Zhengtao
    Loecher, Anno
    GEOPHYSICAL RESEARCH LETTERS, 2021, 48 (08)
  • [42] Reconstructing long-term flood regimes with rainfall data: Effects of flood timing on caddisfly populations
    Lytle, DA
    SOUTHWESTERN NATURALIST, 2003, 48 (01) : 36 - 42
  • [43] How Long Is Long-Term Data Storage?
    Lunt, Barry M.
    ARCHIVING 2011: PRESERVATION STRATEGIES AND IMAGING TECHNOLOGIES FOR CULTURAL HERITAGE INSTITUTIONS AND MEMORY ORGANIZATIONS, 2011, : 29 - 33
  • [44] Regional and Temporal Differences in Nitrate Trends Discerned from Long-Term Water Quality Monitoring Data
    Stets, E. G.
    Kelly, V. J.
    Crawford, C. G.
    JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION, 2015, 51 (05): : 1394 - 1407
  • [45] Space and Time Variations of Ural River Water Quality by Long-Term Data of State Monitoring Network
    Kirpichnikova, N. V.
    Trofimchuk, M. M.
    Kondakova, M. Yu.
    Romanyuk, O. L.
    Fashchevskaya, T. B.
    WATER RESOURCES, 2024, 51 (05) : 704 - 716
  • [46] An analysis of long-term trends, seasonality and short-term dynamics in water quality data from Plynlimon, Wales
    Halliday, Sarah J.
    Wade, Andrew J.
    Skeffington, Richard A.
    Neal, Colin
    Reynolds, Brian
    Rowland, Philip
    Neal, Margaret
    Norris, Dave
    SCIENCE OF THE TOTAL ENVIRONMENT, 2012, 434 : 186 - 200
  • [47] Phosphorus mobilization and delivery estimated from long-term high frequency water quality and discharge data
    Mellander, Per-Erik
    Galloway, Jason
    Hawtree, Daniel
    Jordan, Phil
    FRONTIERS IN WATER, 2022, 4
  • [48] Automated eddy covariance data quality control for long-term measurements
    Sigut, L.
    Mauder, M.
    Sedlak, P.
    Pavelka, M.
    Spunda, V
    GLOBAL CHANGE: A COMPLEX CHALLENGE, 2015, : 58 - 61
  • [49] The quality of retrospective data - An examination of long-term recall in a developing country
    Beckett, M
    DeVanzo, J
    Sastry, N
    Panis, C
    Peterson, C
    JOURNAL OF HUMAN RESOURCES, 2001, 36 (03) : 593 - 625
  • [50] Analysis and Extrapolation of Field Data from the Phase of Long-Term Quality
    Heinemann, Philip
    Wenger, Juergen
    Kuecuekay, Ferit
    TECHNISCHE ZUVERLASSIGKEIT 2011: ENTWICKLUNG UN BETRIEB ZUVERLASSIGER PRODUKTE, 2011, 2146 : 395 - 407