Managing Provenance of Implicit Data Flows in Scientific Experiments

被引:1
|
作者
Neves, Vitor C. [1 ,3 ]
De Oliveira, Daniel [1 ,3 ]
Ocana, Kary A. C. S. [2 ]
Braganholo, Vanessa [1 ,3 ]
Murta, Leonardo [1 ,3 ]
机构
[1] Univ Fed Fluminense, Niteroi, RJ, Brazil
[2] Lab Nacl Comp Cient, Av Getulio Vargas 333, BR-25651075 Petropolis, RJ, Brazil
[3] Inst Comp, Rua Passo da Patria 156, BR-24210240 Niteroi, RJ, Brazil
关键词
Implicit data flows; implicit provenance; scientific experiments; workflows; SYSTEM;
D O I
10.1145/3053372
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific experiments modeled as scientific workflows may create, change, or access data products not explicitly referenced in the workflow specification, leading to implicit data flows. The lack of knowledge about implicit data flows makes the experiments hard to understand and reproduce. In this article, we present ProvMonitor, an approach that identifies the creation, change, or access to data products even within implicit data flows. ProvMonitor links this information with the workflow activity that generated it, allowing for scientists to compare data products within and throughout trials of the same workflow, identifying side effects on data evolution caused by implicit data flows. We evaluated ProvMonitor and observed that it could answer queries for scenarios that demand specific knowledge related to implicit provenance.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] Managing Scientific Data: the EMAP Approach
    Stephen S. Hale
    Melissa M. Hughes
    John F. Paul
    R. Scott McAskill
    Steven A. Rego
    David R. Bender
    Nancy J. Dodge
    Thomas L. Richter
    Jane L. Copeland
    Environmental Monitoring and Assessment, 1998, 51 : 429 - 440
  • [32] Managing scientific data: The EMAP approach
    Hale, SS
    Hughes, MM
    Paul, JF
    McAskill, RS
    Rego, SA
    Bender, DR
    Dodge, NJ
    Richter, TL
    Copeland, JL
    ENVIRONMENTAL MONITORING AND ASSESSMENT, 1998, 51 (1-2) : 429 - 440
  • [33] Nonintrusive collection and management of data provenance in scientific workflows
    Tylissanakis, Giorgos
    Cotronis, Yiannis
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2012, 24 (18): : 2268 - 2281
  • [34] Enabling Data Recommendation in Scientific Workflow based on Provenance
    Huang, Xing
    Lu, Tun
    Ding, Xianghua
    Gu, Ning
    2013 8TH CHINAGRID ANNUAL CONFERENCE (CHINAGRID), 2013, : 117 - 122
  • [35] Data Provenance for Experiment Management of Scientific Applications on GPU
    Kim, Sejin
    Oh, Jisun
    Kim, Yoonhee
    2019 20TH ASIA-PACIFIC NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (APNOMS), 2019,
  • [36] Scientific Workflow, Provenance, and Data Modeling Challenges and Approaches
    Bowers, Shawn
    JOURNAL ON DATA SEMANTICS, 2012, 1 (01) : 19 - 30
  • [37] Data Provenance and Reproducibility in Grid Based Scientific Workflows
    Tylissanakis, G.
    Cotronis, Y.
    2009 4TH INTERNATIONAL CONFERENCE ON GRID AND PERVASIVE COMPUTING WORKSHOPS: (GPC WORKSHOPS), 2009, : 40 - 47
  • [38] Managing big data experiments on smartphones
    Larkou, Georgios
    Mintzis, Marios
    Andreou, Panayiotis G.
    Konstantinidis, Andreas
    Zeinalipour-Yazti, Demetrios
    DISTRIBUTED AND PARALLEL DATABASES, 2016, 34 (01) : 33 - 64
  • [39] MANAGING DATA FROM COMBINATORIAL EXPERIMENTS
    BLANEY, JM
    CIANI, MA
    WEININGER, D
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1994, 208 : 122 - COMP
  • [40] Data pipeline for managing field experiments
    Liu, Jian
    Cichota, Rogerio
    Langer, Stephanie
    Burgueno, Eric
    Michel, Alexandre
    METHODSX, 2023, 10