AProvBio: An Architecture for Data Provenance in Bioinformatics Workflows using Graph Database

被引:0
|
作者
Almeida, Rodrigo [1 ]
da Silva, Waldeyr [1 ]
Castro, Klayton [1 ]
Walter, Maria Emilia [1 ]
Araujo, Aleteia [1 ]
Holanda, Maristela [1 ]
Lifschitz, Sergio [2 ]
机构
[1] Univ Brasilia, Dept Comp Sci, Braslia, Brazil
[2] Pontificia Univ Catolica Rio de Janeiro, Dept Informat, Rio De Janeiro, Brazil
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Many scientific experiments in Bioinformatics are executed as computational workflows. Frequently, it is necessary to re-run an experiment under the original circumstances in which it was run to recognize and validate it. Data provenance concerns the origin of data. Knowing the data source facilitates the understanding and analysis of the results, by detailing and documenting the history and the paths of the input data, from the beginning to the end of an experiment. Therefore, in this context, data provenance can be applied when experimenting traceability. This document presents AProvBio, an architecture that can perform the data provenance of scientific experiments in bioinformatics automatically, using the provenance data model PROV-DM and in a graph database. The architecture can perform the automatic provenance type prospectively, retrospectively and with user-defined data. Thus, the architecture stores and captures information obtained during the execution of the data generation processes with user-defined data information, such as features and versions of the programs used. A graph model, based on the PROV-DM model, was proposed for storing the data provenance. The PROV-DM can be represented by a graph, it allows for a more natural modelling, as well as expressing queries at a more natural level, and the implementation of efficient algorithms to perform specific operations.
引用
收藏
页码:2139 / 2144
页数:6
相关论文
共 50 条
  • [1] Storing provenance data of genome project workflows using graph database
    Pinheiro, Rodrigo
    Aires, Bruno
    Araujo, Aleteia F.
    Holanda, Maristela
    Walter, Maria Emilia
    Lifschitz, Sergio
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
  • [2] Data Provenance Management for Bioinformatics Workflows using NoSQL Database Systems in a Cloud Computing Environment
    Hondo, Fernanda
    Wercelens, Polyane
    da Silva, Waldeyr
    Castro, Klayton
    Santana, Ingrid
    Walter, Maria Emilia
    Araujo, Aleteia
    Holanda, Maristela
    Lifschitz, Sergio
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 1929 - 1934
  • [3] Provenance in bioinformatics workflows
    de Paula, Renato
    Holanda, Maristela
    Gomes, Luciana S. A.
    Lifschitz, Sergio
    Walter, Maria Emilia M. T.
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [4] Provenance in bioinformatics workflows
    Renato de Paula
    Maristela Holanda
    Luciana SA Gomes
    Sergio Lifschitz
    Maria Emilia MT Walter
    [J]. BMC Bioinformatics, 14
  • [5] Data Provenance Management of Bioinformatics Workflows in Federated Clouds
    Wercelens, Polyane
    da Silva, Waldeyr
    Castro, Klayton
    Araujo, Aleteia P. F.
    Lifschitz, Sergio
    Holanda, Maristela
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 750 - 754
  • [6] Bioinformatics Workflows With NoSQL Database in Cloud Computing
    Wercelens, Polyane
    da Silva, Waldeyr
    Hondo, Fernanda
    Castro, Klayton
    Walter, Maria Emilia
    Araujo, Aleteia
    Lifschitz, Sergio
    Holanda, Maristela
    [J]. EVOLUTIONARY BIOINFORMATICS, 2019, 15
  • [7] Provenance Framework for Twitter Data using Zero-Information Loss Graph Database
    Rani, Asma
    Goyal, Navneet
    Gadia, Shashi K.
    [J]. CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 74 - 82
  • [8] Implementation of a Web architecture for the execution of workflows in bioinformatics
    Barraza, Fernando
    Salazar, Gustavo
    Cuesta-Astroz, Yesid
    Restrepo, Oscar E.
    [J]. INGENIERIA Y COMPETITIVIDAD, 2006, 8 (02): : 34 - 45
  • [9] Data reduction in scientific workflows using provenance monitoring and user steering
    Souza, Renan
    Silva, Vitor
    Coutinho, Alvaro L. G. A.
    Valduriez, Patrick
    Mattoso, Marta
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 110 (110): : 481 - 501
  • [10] RESTful Open Workflows for Data Provenance and Reuse
    Eckert, Kai
    Ritze, Dominique
    Baierer, Konstantin
    Bizer, Christian
    [J]. WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 259 - 260