Datatrack: An R package for managing data in a multi-stage experimental workflow

被引:0
|
作者
Eichinski, Philip [1 ]
Roe, Paul [1 ]
机构
[1] Queensland Univ Technol, Sci & Engn Fac, Brisbane, Qld, Australia
关键词
computational science; data provenance; R language; R package; workflow;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In experimental research using computation, a workflow is a sequence of steps involving some data processing or analysis where the output of one step may be used as the input of another. The processing steps may involve user-supplied parameters, that when modified, result in a new version of input to the downstream steps, in turn generating new versions of their own output. As more experimentation is done, the results of these various steps can become numerous. It is important to keep track of which data output is dependent on which other generated data, and which parameters were used. In many situations, scientific workflow management systems solve this problem, but these systems are best suited to collaborative, distributed experiments using a variety of services, possibly batch processing parameter sweeps. This paper presents an R package for managing and navigating a network of interdependent data. It is intended as a lightweight tool that provides some visual data provenance information to the experimenter to allow them to manage their generated data as they run experiments within their familiar scripting environment, where it may not be desirable to commit to a fully-blown comprehensive workflow manager. The package consists of wrapper functions for writing and reading output data that can be called from within the R analysis scripts, as well as a visualization of the data-output dependency graph rendered within the R-studio console. Thus, it offers benefit to the experimenter while requiring minimal commitment for integration in their existing working environment.
引用
收藏
页码:147 / 154
页数:8
相关论文
共 50 条
  • [21] Multi-Stage Data Fusion and the MSTWG TNO Datasets
    Coraluppi, Stefano
    Carthel, Craig
    FUSION: 2009 12TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOLS 1-4, 2009, : 1552 - 1559
  • [22] Multi-Stage data envelopment analysis congestion model
    Sharma, Mithun J.
    Yu, Song Jin
    OPERATIONAL RESEARCH, 2013, 13 (03) : 399 - 413
  • [23] Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
    Falster, Daniel S.
    FitzJohn, Richard G.
    Pennell, Matthew W.
    Cornwell, William K.
    GIGASCIENCE, 2019, 8 (05):
  • [24] hybriddetective: A workflow and package to facilitate the detection of hybridization using genomic data in r
    Wringe, Brendan F.
    Stanley, Ryan R. E.
    Jeffery, Nicholas W.
    Anderson, Eric C.
    Bradbury, Ian R.
    MOLECULAR ECOLOGY RESOURCES, 2017, 17 (06) : e275 - e284
  • [25] Behaviour of volatile compounds during batch multi-stage distillation of whisky: experimental and simulation data
    Esteban-Decloux, Martine
    Grangeon, Herve
    Tano, N'Guessan Charles Romaric
    JOURNAL OF THE INSTITUTE OF BREWING, 2023, 129 (03) : 192 - 208
  • [26] biomonitoR: an R package for managing ecological data and calculating biomonitoring indices
    Laini, Alex
    Guareschi, Simone
    Bolpagni, Rossano
    Burgazzi, Gemma
    Bruno, Daniel
    Gutierrez-Canovas, Cayetano
    Miranda, Rafael
    Mondy, Cedric
    Varbiro, Gabor
    Cancellario, Tommaso
    PEERJ COMPUTER SCIENCE, 2022, 10
  • [27] archivist: An R Package for Managing, Recording and Restoring Data Analysis Results
    Biecek, Przemyslaw
    Kosinski, Marcin
    JOURNAL OF STATISTICAL SOFTWARE, 2017, 82 (11): : 1 - 28
  • [28] WaterML R package for managing ecological experiment data on a CUAHSI HydroServer
    Kadlec, Jiri
    StClair, Bryn
    Ames, Daniel P.
    Gill, Richard A.
    ECOLOGICAL INFORMATICS, 2015, 28 : 19 - 28
  • [29] biomonitoR: an R package for managing ecological data and calculating biomonitoring indices
    Laini, Alex
    Guareschi, Simone
    Bolpagni, Rossano
    Burgazzi, Gemma
    Bruno, Daniel
    Gutierrez-Canovas, Cayetano
    Miranda, Rafael
    Mondy, Cedric
    Varbiro, Gabor
    Cancellario, Tommaso
    PEERJ, 2022, 10
  • [30] A Change Management Framework: Managing the Error Variations in Multi-stage Machining Processes
    Jiang, Pingyu
    Liu, Daoyu
    Jia, Feng
    2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, : 3660 - +