Datatrack: An R package for managing data in a multi-stage experimental workflow

被引：0

作者：

Eichinski, Philip ^{[1
]}

Roe, Paul ^{[1
]}

机构：

[1] Queensland Univ Technol, Sci & Engn Fac, Brisbane, Qld, Australia

来源：

PROCEEDINGS OF THE 2016 IEEE 12TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE) | 2016年

关键词：

computational science; data provenance; R language; R package; workflow;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In experimental research using computation, a workflow is a sequence of steps involving some data processing or analysis where the output of one step may be used as the input of another. The processing steps may involve user-supplied parameters, that when modified, result in a new version of input to the downstream steps, in turn generating new versions of their own output. As more experimentation is done, the results of these various steps can become numerous. It is important to keep track of which data output is dependent on which other generated data, and which parameters were used. In many situations, scientific workflow management systems solve this problem, but these systems are best suited to collaborative, distributed experiments using a variety of services, possibly batch processing parameter sweeps. This paper presents an R package for managing and navigating a network of interdependent data. It is intended as a lightweight tool that provides some visual data provenance information to the experimenter to allow them to manage their generated data as they run experiments within their familiar scripting environment, where it may not be desirable to commit to a fully-blown comprehensive workflow manager. The package consists of wrapper functions for writing and reading output data that can be called from within the R analysis scripts, as well as a visualization of the data-output dependency graph rendered within the R-studio console. Thus, it offers benefit to the experimenter while requiring minimal commitment for integration in their existing working environment.

引用

页码：147 / 154

页数：8

共 50 条

[21] Multi-Stage Data Fusion and the MSTWG TNO Datasets
Coraluppi, Stefano
Carthel, Craig
FUSION: 2009 12TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOLS 1-4, 2009, : 1552 - 1559
[22] Multi-Stage data envelopment analysis congestion model
Sharma, Mithun J.
Yu, Song Jin
OPERATIONAL RESEARCH, 2013, 13 (03) : 399 - 413
[23] Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
Falster, Daniel S.
FitzJohn, Richard G.
Pennell, Matthew W.
Cornwell, William K.
GIGASCIENCE, 2019, 8 (05):
[24] hybriddetective: A workflow and package to facilitate the detection of hybridization using genomic data in r
Wringe, Brendan F.
Stanley, Ryan R. E.
Jeffery, Nicholas W.
Anderson, Eric C.
Bradbury, Ian R.
MOLECULAR ECOLOGY RESOURCES, 2017, 17 (06) : e275 - e284
[25] Behaviour of volatile compounds during batch multi-stage distillation of whisky: experimental and simulation data
Esteban-Decloux, Martine
Grangeon, Herve
Tano, N'Guessan Charles Romaric
JOURNAL OF THE INSTITUTE OF BREWING, 2023, 129 (03) : 192 - 208
[26] biomonitoR: an R package for managing ecological data and calculating biomonitoring indices
Laini, Alex
Guareschi, Simone
Bolpagni, Rossano
Burgazzi, Gemma
Bruno, Daniel
Gutierrez-Canovas, Cayetano
Miranda, Rafael
Mondy, Cedric
Varbiro, Gabor
Cancellario, Tommaso
PEERJ COMPUTER SCIENCE, 2022, 10
[27] archivist: An R Package for Managing, Recording and Restoring Data Analysis Results
Biecek, Przemyslaw
Kosinski, Marcin
JOURNAL OF STATISTICAL SOFTWARE, 2017, 82 (11): : 1 - 28
[28] WaterML R package for managing ecological experiment data on a CUAHSI HydroServer
Kadlec, Jiri
StClair, Bryn
Ames, Daniel P.
Gill, Richard A.
ECOLOGICAL INFORMATICS, 2015, 28 : 19 - 28
[29] biomonitoR: an R package for managing ecological data and calculating biomonitoring indices
Laini, Alex
Guareschi, Simone
Bolpagni, Rossano
Burgazzi, Gemma
Bruno, Daniel
Gutierrez-Canovas, Cayetano
Miranda, Rafael
Mondy, Cedric
Varbiro, Gabor
Cancellario, Tommaso
PEERJ, 2022, 10
[30] A Change Management Framework: Managing the Error Variations in Multi-stage Machining Processes
Jiang, Pingyu
Liu, Daoyu
Jia, Feng
2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, : 3660 - +

← 1 2 3 4 5 →