Datatrack: An R package for managing data in a multi-stage experimental workflow

被引:0
|
作者
Eichinski, Philip [1 ]
Roe, Paul [1 ]
机构
[1] Queensland Univ Technol, Sci & Engn Fac, Brisbane, Qld, Australia
关键词
computational science; data provenance; R language; R package; workflow;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In experimental research using computation, a workflow is a sequence of steps involving some data processing or analysis where the output of one step may be used as the input of another. The processing steps may involve user-supplied parameters, that when modified, result in a new version of input to the downstream steps, in turn generating new versions of their own output. As more experimentation is done, the results of these various steps can become numerous. It is important to keep track of which data output is dependent on which other generated data, and which parameters were used. In many situations, scientific workflow management systems solve this problem, but these systems are best suited to collaborative, distributed experiments using a variety of services, possibly batch processing parameter sweeps. This paper presents an R package for managing and navigating a network of interdependent data. It is intended as a lightweight tool that provides some visual data provenance information to the experimenter to allow them to manage their generated data as they run experiments within their familiar scripting environment, where it may not be desirable to commit to a fully-blown comprehensive workflow manager. The package consists of wrapper functions for writing and reading output data that can be called from within the R analysis scripts, as well as a visualization of the data-output dependency graph rendered within the R-studio console. Thus, it offers benefit to the experimenter while requiring minimal commitment for integration in their existing working environment.
引用
收藏
页码:147 / 154
页数:8
相关论文
共 50 条
  • [1] Selectiongain: an R package for optimizing multi-stage selection
    Mi, Xuefei
    Utz, H. Friedrich
    Melchinger, Albrecht E.
    COMPUTATIONAL STATISTICS, 2016, 31 (02) : 533 - 543
  • [2] Selectiongain: an R package for optimizing multi-stage selection
    Xuefei Mi
    H. Friedrich Utz
    Albrecht E. Melchinger
    Computational Statistics, 2016, 31 : 533 - 543
  • [3] The R Package MAMS for Designing Multi-Arm Multi-Stage Clinical Trials
    Jaki, Thomas
    Pallmann, Philip
    Magirr, Dominic
    JOURNAL OF STATISTICAL SOFTWARE, 2019, 88 (04): : 1 - 25
  • [4] A multi-stage batch sorption design with experimental data
    Ho, YS
    McKay, G
    ADSORPTION SCIENCE & TECHNOLOGY, 1999, 17 (04) : 233 - 243
  • [5] plantR: An R package and workflow for managing species records from biological collections
    de Lima, Renato A. F.
    Sanchez-Tapia, Andrea
    Mortara, Sara R.
    ter Steege, Hans
    de Siqueira, Marinez F.
    METHODS IN ECOLOGY AND EVOLUTION, 2023, 14 (02): : 332 - 339
  • [6] LUNG CANCER IDENTIFICATION VIA DEEP LEARNING: A MULTI-STAGE WORKFLOW
    Canavesi, Irene
    D'Arnese, Eleonora
    Caramaschi, Sara
    Santambrogio, Marco D.
    2022 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (IEEE ISBI 2022), 2022,
  • [7] Managing dynamic changes in multi-stage program generation systems
    Wang, ZH
    Muntz, RR
    GENERATIVE PROGRAMMING AND COMPONENT ENGINEERING 2002, PROCEEDINGS, 2002, 2487 : 316 - 334
  • [8] Experimental comparison of multi-stage and one-stage contests
    Sheremeta, Roman M.
    GAMES AND ECONOMIC BEHAVIOR, 2010, 68 (02) : 731 - 747
  • [9] Multi-stage Prediction Networks for Data Harmonization
    Blumberg, Stefano B.
    Palombo, Marco
    Khoo, Can Son
    Tax, Chantal M. W.
    Tanno, Ryutaro
    Alexander, Daniel C.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 411 - 419
  • [10] Adaptive Inference for Multi-Stage Survey Data
    Al-Zou'bi, Loai Mahmoud
    Clark, Robert Graham
    Steel, David G.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2010, 39 (07) : 1334 - 1350