AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance

被引:0
|
作者
Sebastiaan P. Huber
Spyros Zoupanos
Martin Uhrin
Leopold Talirz
Leonid Kahle
Rico Häuselmann
Dominik Gresch
Tiziano Müller
Aliaksandr V. Yakutovich
Casper W. Andersen
Francisco F. Ramirez
Carl S. Adorf
Fernando Gargiulo
Snehal Kumbhar
Elsa Passaro
Conrad Johnston
Andrius Merkys
Andrea Cepellotti
Nicolas Mounet
Nicola Marzari
Boris Kozinsky
Giovanni Pizzi
机构
[1] National Centre for Computational Design and Discovery of Novel Materials (MARVEL),Department of Chemistry
[2] École Polytechnique Fédérale de Lausanne,John A. Paulson School of Engineering and Applied Sciences
[3] Theory and Simulation of Materials (THEOS),undefined
[4] Faculté des Sciences et Techniques de l’Ingénieur,undefined
[5] École Polytechnique Fédérale de Lausanne,undefined
[6] Laboratory of Molecular Simulation (LSMO),undefined
[7] Institut des Sciences et Ingénierie Chimiques,undefined
[8] École Polytechnique Fédérale de Lausanne (EPFL),undefined
[9] Rue de l’Industrie 17,undefined
[10] Microsoft Station Q,undefined
[11] University of California,undefined
[12] University of Zürich,undefined
[13] Vilnius University Institute of Biotechnology,undefined
[14] Saulėtekio al. 7,undefined
[15] Harvard University,undefined
[16] Robert Bosch LLC,undefined
[17] Research and Technology Center North America,undefined
[18] 255 Main St,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
The ever-growing availability of computing power and the sustained development of advanced computational methods have contributed much to recent scientific progress. These developments present new challenges driven by the sheer amount of calculations and data to manage. Next-generation exascale supercomputers will harden these challenges, such that automated and scalable solutions become crucial. In recent years, we have been developing AiiDA (aiida.net), a robust open-source high-throughput infrastructure addressing the challenges arising from the needs of automated workflow management and data provenance recording. Here, we introduce developments and capabilities required to reach sustained performance, with AiiDA supporting throughputs of tens of thousands processes/hour, while automatically preserving and storing the full data provenance in a relational database making it queryable and traversable, thus enabling high-performance data analytics. AiiDA’s workflow language provides advanced automation, error handling features and a flexible plugin model to allow interfacing with external simulation software. The associated plugin registry enables seamless sharing of extensions, empowering a vibrant user community dedicated to making simulations more robust, user-friendly and reproducible.
引用
收藏
相关论文
共 16 条
  • [1] AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance
    Huber, Sebastiaan P.
    Zoupanos, Spyros
    Uhrin, Martin
    Talirz, Leopold
    Kahle, Leonid
    Haeuselmann, Rico
    Gresch, Dominik
    Mueller, Tiziano
    Yakutovich, Aliaksandr V.
    Andersen, Casper W.
    Ramirez, Francisco F.
    Adorf, Carl S.
    Gargiulo, Fernando
    Kumbhar, Snehal
    Passaro, Elsa
    Johnston, Conrad
    Merkys, Andrius
    Cepellotti, Andrea
    Mounet, Nicolas
    Marzari, Nicola
    Kozinsky, Boris
    Pizzi, Giovanni
    [J]. SCIENTIFIC DATA, 2020, 7 (01)
  • [2] Automated reproducible workflows and data provenance with AiiDA
    Sebastiaan P. Huber
    [J]. Nature Reviews Physics, 2022, 4 : 431 - 431
  • [3] Automated reproducible workflows and data provenance with AiiDA
    Huber, Sebastiaan P.
    [J]. NATURE REVIEWS PHYSICS, 2022, 4 (07) : 431 - 431
  • [4] AiiDA: automated interactive infrastructure and database for computational science
    Pizzi, Giovanni
    Cepellotti, Andrea
    Sabatini, Riccardo
    Marzari, Nicola
    Kozinsky, Boris
    [J]. COMPUTATIONAL MATERIALS SCIENCE, 2016, 111 : 218 - 230
  • [5] Scalable Provenance Storage and Querying Using Pig Latin for Big Data Workflows
    Bhuyan, Fahima
    Lu, Shiyong
    Ruan, Dong
    Zhang, Jia
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC), 2017, : 459 - 466
  • [6] OpenCyto: An Open Source Infrastructure for Scalable, Robust, Reproducible, and Automated, End-to-End Flow Cytometry Data Analysis
    Finak, Greg
    Frelinger, Jacob
    Jiang, Wenxin
    Newell, Evan W.
    Ramey, John
    Davis, Mark M.
    Kalams, Spyros A.
    De Rosa, Stephen C.
    Gottardo, Raphael
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (08)
  • [7] INTRODUCING DATA PROVENANCE AND ERROR HANDLING FOR NGS WORKFLOWS WITHIN THE MOLGENIS COMPUTATIONAL FRAMEWORK
    Byelas, H. V.
    Dijkstra, M.
    Swertz, M. A.
    [J]. BIOINFORMATICS: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOINFORMATICS MODELS, METHODS AND ALGORITHMS, 2012, : 42 - 50
  • [8] A Distributed, Scalable and. Provenance-enabled. Data Access Protocol for Spatial Data Infrastructure
    Warekuromor, Tubolayefa
    James, Anne
    Anifowose, Babatunde
    Trodd, Nigel
    [J]. 2017 IEEE 21ST INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2017, : 180 - 185
  • [9] Scalable Infrastructure Supporting Reproducible Nationwide Healthcare Data Analysis toward FAIR Stewardship
    Ji-Woo Kim
    Chungsoo Kim
    Kyoung-Hoon Kim
    Yujin Lee
    Dong Han Yu
    Jeongwon Yun
    Hyeran Baek
    Rae Woong Park
    Seng Chan You
    [J]. Scientific Data, 10
  • [10] Scalable Infrastructure Supporting Reproducible Nationwide Healthcare Data Analysis toward FAIR Stewardship
    Kim, Ji-Woo
    Kim, Chungsoo
    Kim, Kyoung-Hoon
    Lee, Yujin
    Yu, Dong Han
    Yun, Jeongwon
    Baek, Hyeran
    Park, Rae Woong
    You, Seng Chan
    [J]. SCIENTIFIC DATA, 2023, 10 (01)