Reproducibility Starts at the Source: R, Python']Python, and Julia Packages for Retrieving USGS Hydrologic Data

被引:3
|
作者
Hodson, Timothy O. [1 ]
Decicco, Laura A. [2 ]
Hariharan, Jayaram A. [3 ]
Stanish, Lee F. [3 ]
Black, Scott [4 ]
Horsburgh, Jeffery S. [5 ]
机构
[1] US Geol Survey, Cent Midwest Water Sci Ctr, Urbana, IL 61801 USA
[2] US Geol Survey, Upper Midwest Water Sci Ctr, Madison, WI 53726 USA
[3] US Geol Survey, Water Mission Area, Reston, VA 20192 USA
[4] Consortium Univ Advancement Hydrol Sci Inc CUAHSI, Arlington, MA 02476 USA
[5] Utah State Univ, Civil & Environm Engn, Logan, UT 84322 USA
基金
美国国家科学基金会;
关键词
packaged workflows; water data; reproducibility; open science; open data; open source; R; !text type='Python']Python[!/text; Julia; Jupyter; USGS; JUPYTER;
D O I
10.3390/w15244236
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Much of modern science takes place in a computational environment, and, increasingly, that environment is programmed using R, Python, or Julia. Furthermore, most scientific data now live on the cloud, so the first step in many workflows is to query a cloud database and load the response into a computational environment for further analysis. Thus, tools that facilitate programmatic data retrieval represent a critical component in reproducible scientific workflows. Earth science is no different in this regard. To fulfill that basic need, we developed R, Python, and Julia packages providing programmatic access to the U.S. Geological Survey's National Water Information System database and the multi-agency Water Quality Portal. Together, these packages create a common interface for retrieving hydrologic data in the Jupyter ecosystem, which is widely used in water research, operations, and teaching. Source code, documentation, and tutorials for the packages are available on GitHub. Users can go there to learn, raise issues, or contribute improvements within a single platform, which helps foster better engagement and collaboration between data providers and their users.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] graphkernels: R and Python']Python packages for graph comparison
    Sugiyama, Mahito
    Ghisu, M. Elisabetta
    Llinares-Lopez, Felipe
    Borgwardt, Karsten
    BIOINFORMATICS, 2018, 34 (03) : 530 - 532
  • [2] SUREHYP: An Open Source Python']Python Package for Preprocessing Hyperion Radiance Data and Retrieving Surface Reflectance
    Miraglio, Thomas
    Coops, Nicholas C.
    SENSORS, 2022, 22 (23)
  • [3] Python']Python and R for the Modern Data Scientist
    Lortie, Christopher J.
    JOURNAL OF STATISTICAL SOFTWARE, 2022, 103 (BR2): : 1 - 4
  • [4] ObspyDMT: a Python']Python toolbox for retrieving and processing large seismological data sets
    Hosseini, Kasra
    Sigloch, Karin
    SOLID EARTH, 2017, 8 (05) : 1047 - 1070
  • [5] Modeling the impact of Python']Python and R packages using dependency and contributor networks
    Korkmaz, Gizem
    Kelling, Claire
    Robbins, Carol
    Keller, Sallie
    SOCIAL NETWORK ANALYSIS AND MINING, 2019, 10 (01)
  • [7] Foundations of Statistics for Data Scientists: With R and Python']Python
    Horton, Nicholas J.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (539) : 1603 - 1604
  • [8] Foundations of Statistics for Data Scientists: With R and Python']Python
    Leemis, Lawrence
    JOURNAL OF QUALITY TECHNOLOGY, 2023, 55 (05) : 617 - 618
  • [9] An Open Source Python']Python Library for Anonymizing Sensitive Data
    Diaz, Judith Sainz-Pardo
    Garcia, Alvaro Lopez
    SCIENTIFIC DATA, 2024, 11 (01)
  • [10] OpenAnnotateApi: Python']Python and R packages to efficiently annotate and analyze chromatin accessibility of genomic regions
    Gao, Zijing
    Jiang, Rui
    Chen, Shengquan
    BIOINFORMATICS ADVANCES, 2024, 4 (01):