A posteriori metadata from automated provenance tracking: integration of AiiDA and TCOD

被引:16
|
作者
Merkys, Andrius [1 ,2 ,3 ]
Mounet, Nicolas [1 ,2 ]
Cepellotti, Andrea [1 ,2 ]
Marzari, Nicola [1 ,2 ]
Grazulis, Saulius [3 ,4 ]
Pizzi, Giovanni [1 ,2 ]
机构
[1] Theory & Simulat Mat THEOS, CH-1015 Lausanne, Switzerland
[2] Natl Ctr Computat Design & Discovery Novel Mat MA, CH-1015 Lausanne, Switzerland
[3] Vilnius Univ, Inst Biotechnol, Sauletekio 7, LT-10257 Vilnius, Lithuania
[4] Vilnius Univ, Fac Math & Informat, Naugarduko St 24, LT-03225 Vilnius, Lithuania
来源
基金
瑞士国家科学基金会;
关键词
DFT; Reproducibility; Provenance; Open data; Ontology; Materials science; CRYSTAL-STRUCTURE DATABASE; REPRODUCIBLE RESEARCH; CIF; INFORMATION; VERSION; FILE;
D O I
10.1186/s13321-017-0242-y
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In order to make results of computational scientific research findable, accessible, interoperable and re-usable, it is necessary to decorate them with standardised metadata. However, there are a number of technical and practical challenges that make this process difficult to achieve in practice. Here the implementation of a protocol is presented to tag crystal structures with their computed properties, without the need of human intervention to curate the data. This protocol leverages the capabilities of AiiDA, an open-source platform to manage and automate scientific computational workflows, and the TCOD, an open-access database storing computed materials properties using a well-defined and exhaustive ontology. Based on these, the complete procedure to deposit computed data in the TCOD database is automated. All relevant metadata are extracted from the full provenance information that AiiDA tracks and stores automatically while managing the calculations. Such a protocol also enables reproducibility of scientific data in the field of computational materials science. As a proof of concept, the AiiDA-TCOD interface is used to deposit 170 theoretical structures together with their computed properties and their full provenance graphs, consisting in over 4600 AiiDA nodes.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] A posteriori metadata from automated provenance tracking: integration of AiiDA and TCOD
    Andrius Merkys
    Nicolas Mounet
    Andrea Cepellotti
    Nicola Marzari
    Saulius Gražulis
    Giovanni Pizzi
    [J]. Journal of Cheminformatics, 9
  • [2] Automated reproducible workflows and data provenance with AiiDA
    Sebastiaan P. Huber
    [J]. Nature Reviews Physics, 2022, 4 : 431 - 431
  • [3] Automated reproducible workflows and data provenance with AiiDA
    Huber, Sebastiaan P.
    [J]. NATURE REVIEWS PHYSICS, 2022, 4 (07) : 431 - 431
  • [4] AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance
    Sebastiaan P. Huber
    Spyros Zoupanos
    Martin Uhrin
    Leopold Talirz
    Leonid Kahle
    Rico Häuselmann
    Dominik Gresch
    Tiziano Müller
    Aliaksandr V. Yakutovich
    Casper W. Andersen
    Francisco F. Ramirez
    Carl S. Adorf
    Fernando Gargiulo
    Snehal Kumbhar
    Elsa Passaro
    Conrad Johnston
    Andrius Merkys
    Andrea Cepellotti
    Nicolas Mounet
    Nicola Marzari
    Boris Kozinsky
    Giovanni Pizzi
    [J]. Scientific Data, 7
  • [5] AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance
    Huber, Sebastiaan P.
    Zoupanos, Spyros
    Uhrin, Martin
    Talirz, Leopold
    Kahle, Leonid
    Haeuselmann, Rico
    Gresch, Dominik
    Mueller, Tiziano
    Yakutovich, Aliaksandr V.
    Andersen, Casper W.
    Ramirez, Francisco F.
    Adorf, Carl S.
    Gargiulo, Fernando
    Kumbhar, Snehal
    Passaro, Elsa
    Johnston, Conrad
    Merkys, Andrius
    Cepellotti, Andrea
    Mounet, Nicolas
    Marzari, Nicola
    Kozinsky, Boris
    Pizzi, Giovanni
    [J]. SCIENTIFIC DATA, 2020, 7 (01)
  • [6] Ontology Based Tracking and Propagation of Provenance Metadata
    Vacura, Miroslav
    Svatek, Vojtech
    [J]. NETWORKED DIGITAL TECHNOLOGIES, PT 1, 2010, 87 : 489 - 496
  • [7] Extracting Provenance Metadata from Privacy Policies
    Pandit, Harshvardhan Jitendra
    O'Sullivan, Declan
    Lewis, Dave
    [J]. PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, IPAW 2018, 2018, 11017 : 262 - 265
  • [8] Vamsa: Automated Provenance Tracking in Data Science Scripts
    Namaki, Mohammad Hossein
    Floratou, Avrilia
    Psallidas, Fotis
    Krishnan, Subru
    Agrawal, Ashvin
    Wu, Yinghui
    Zhu, Yiwen
    Weimer, Markus
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1542 - 1551
  • [9] Augmenting geospatial data provenance through metadata tracking in geospatial service chaining
    Yue, Peng
    Gong, Jianya
    Di, Liping
    [J]. COMPUTERS & GEOSCIENCES, 2010, 36 (03) : 270 - 281
  • [10] Automated Integration of Genomic Metadata with Sequence-to-Sequence Models
    Cannizzaro, Giuseppe
    Leone, Michele
    Bernasconi, Anna
    Canakoglu, Arif
    Carman, Mark J.
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2020, PT V, 2021, 12461 : 187 - 203