Managing Provenance of Implicit Data Flows in Scientific Experiments

被引:1
|
作者
Neves, Vitor C. [1 ,3 ]
De Oliveira, Daniel [1 ,3 ]
Ocana, Kary A. C. S. [2 ]
Braganholo, Vanessa [1 ,3 ]
Murta, Leonardo [1 ,3 ]
机构
[1] Univ Fed Fluminense, Niteroi, RJ, Brazil
[2] Lab Nacl Comp Cient, Av Getulio Vargas 333, BR-25651075 Petropolis, RJ, Brazil
[3] Inst Comp, Rua Passo da Patria 156, BR-24210240 Niteroi, RJ, Brazil
关键词
Implicit data flows; implicit provenance; scientific experiments; workflows; SYSTEM;
D O I
10.1145/3053372
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific experiments modeled as scientific workflows may create, change, or access data products not explicitly referenced in the workflow specification, leading to implicit data flows. The lack of knowledge about implicit data flows makes the experiments hard to understand and reproduce. In this article, we present ProvMonitor, an approach that identifies the creation, change, or access to data products even within implicit data flows. ProvMonitor links this information with the workflow activity that generated it, allowing for scientists to compare data products within and throughout trials of the same workflow, identifying side effects on data evolution caused by implicit data flows. We evaluated ProvMonitor and observed that it could answer queries for scenarios that demand specific knowledge related to implicit provenance.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Connecting scientific data to scientific experiments with provenance
    Miles, Simon
    Deelman, Ewa
    Groth, Paul
    Vahi, Karan
    Mehta, Gaurang
    Moreau, Luc
    E-SCIENCE 2007: THIRD IEEE INTERNATIONAL CONFERENCE ON E-SCIENCE AND GRID COMPUTING, PROCEEDINGS, 2007, : 179 - +
  • [2] Semantic provenance for eScience - Managing the deluge of scientific data
    Sahoo, Satya S.
    Sheth, Amit
    Henson, Cory
    IEEE INTERNET COMPUTING, 2008, 12 (04) : 46 - 54
  • [3] Managing data provenance in database
    Liu, Xiping
    Wan, Changxuan
    Jiang, Tengjiao
    Journal of Information and Computational Science, 2009, 6 (01): : 423 - 431
  • [4] Towards Planning Scientific Experiments through Declarative Model Discovery in Provenance Data
    Silva, Mateus Ferreira
    Baiao, Fernanda Araujo
    Revoredo, Kate
    2014 IEEE 10TH INTERNATIONAL CONFERENCE ON ESCIENCE WORKSHOPS (ESCIENCE 2014), VOL 2, 2014, : 95 - 98
  • [5] Data Ecosystems for Scientific Experiments: Managing Combustion Experiments and Simulation Analyses in Chemical Engineering
    Ramalli, Edoardo
    Scalia, Gabriele
    Pernici, Barbara
    Stagni, Alessandro
    Cuoci, Alberto
    Faravelli, Tiziano
    FRONTIERS IN BIG DATA, 2021, 4
  • [6] Managing Scientific Data
    Ailamaki, Anastasia
    Kantere, Verena
    Dash, Debabrata
    COMMUNICATIONS OF THE ACM, 2010, 53 (06) : 68 - 78
  • [7] Project histories:: Managing data provenance across collection-oriented scientific workflow runs
    Bowers, Shawn
    McPhillips, Timothy
    Wu, Martin
    Ludaescher, Bertram
    DATA INTEGRATION IN THE LIFE SCIENCES, PROCEEDINGS, 2007, 4544 : 122 - +
  • [8] Temporal Representation for Scientific Data Provenance
    Chen, Peng
    Plale, Beth
    Aktas, Mehmet S.
    2012 IEEE 8TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), 2012,
  • [9] Provenance and credibility in scientific data repositories
    Fear, Kathleen
    Donaldson, Devan Ray
    ARCHIVAL SCIENCE, 2012, 12 (03) : 319 - 339
  • [10] A Tool for Scientific Provenance of Data and Software
    Ceguerra, Anna V.
    Liddicoat, Peter V.
    Ringer, Simon P.
    Goscinski, Wojtek J.
    Androulakis, Steve
    2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 561 - 565