Managing Provenance of Implicit Data Flows in Scientific Experiments

被引:1
|
作者
Neves, Vitor C. [1 ,3 ]
De Oliveira, Daniel [1 ,3 ]
Ocana, Kary A. C. S. [2 ]
Braganholo, Vanessa [1 ,3 ]
Murta, Leonardo [1 ,3 ]
机构
[1] Univ Fed Fluminense, Niteroi, RJ, Brazil
[2] Lab Nacl Comp Cient, Av Getulio Vargas 333, BR-25651075 Petropolis, RJ, Brazil
[3] Inst Comp, Rua Passo da Patria 156, BR-24210240 Niteroi, RJ, Brazil
关键词
Implicit data flows; implicit provenance; scientific experiments; workflows; SYSTEM;
D O I
10.1145/3053372
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific experiments modeled as scientific workflows may create, change, or access data products not explicitly referenced in the workflow specification, leading to implicit data flows. The lack of knowledge about implicit data flows makes the experiments hard to understand and reproduce. In this article, we present ProvMonitor, an approach that identifies the creation, change, or access to data products even within implicit data flows. ProvMonitor links this information with the workflow activity that generated it, allowing for scientists to compare data products within and throughout trials of the same workflow, identifying side effects on data evolution caused by implicit data flows. We evaluated ProvMonitor and observed that it could answer queries for scenarios that demand specific knowledge related to implicit provenance.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] A Scientific Data Provenance Harvester for Distributed Applications
    Stephan, Eric
    Raju, Bibi
    Elsethagen, Todd
    Pouchard, Line
    Gamboa, Carlos
    2017 NEW YORK SCIENTIFIC DATA SUMMIT (NYSDS), 2017,
  • [22] Toward the modeling of data provenance in scientific publications
    Mahmood, Tariq
    Jami, Syed Imran
    Shaikh, Zubair Ahmed
    Mughal, Muhammad Hussain
    COMPUTER STANDARDS & INTERFACES, 2013, 35 (01) : 6 - 29
  • [23] Applying Provenance to Protect Attribution in Distributed Computational Scientific Experiments
    Gadelha, Luiz M. R., Jr.
    Mattoso, Marta
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES (IPAW 2014), 2015, 8628 : 139 - 151
  • [24] Temporal representation for mining scientific data provenance
    Chen, Peng
    Plale, Beth
    Aktas, Mehmet S.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 36 : 363 - 378
  • [25] Visualizing Large Scale Scientific Data Provenance
    Chen, Peng
    Plale, Beth
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1388 - 1388
  • [26] Visualizing Large Scale Scientific Data Provenance
    Chen, Peng
    Plale, Beth
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1387 - 1387
  • [27] RDFPROV: A relational RDF store for querying and managing scientific workflow provenance
    Chebotko, Artem
    Lu, Shiyong
    Fei, Xubo
    Fotouhi, Farshad
    DATA & KNOWLEDGE ENGINEERING, 2010, 69 (08) : 836 - 865
  • [28] Managing Provenance Data in Knowledge Graph Management Platforms
    Kleinsteuber, Erik
    Al Mustafa, Tarek
    Zander, Franziska
    König-Ries, Birgitta
    Babalou, Samira
    Datenbank-Spektrum, 2024, 24 (01) : 43 - 52
  • [29] Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data
    Sahoo, Satya S.
    Bodenreider, Olivier
    Hitzler, Pascal
    Sheth, Amit
    Thirunarayan, Krishnaprasad
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2010, 6187 : 461 - +
  • [30] Managing Scientific Information and Research Data
    Martinez, Diane
    TECHNICAL COMMUNICATION, 2016, 63 (03) : 287 - 288