From the desktop to the grid: scalable bioinformatics via workflow conversion

被引:10
|
作者
de la Garza, Luis [1 ,2 ]
Veit, Johannes [1 ,2 ]
Szolek, Andras [1 ,2 ]
Roettig, Marc [1 ,2 ]
Aiche, Stephan [3 ]
Gesing, Sandra [4 ]
Reinert, Knut [3 ]
Kohlbacher, Oliver [1 ,2 ]
机构
[1] Univ Tubingen, Ctr Bioinformat, Sand 14, D-72070 Tubingen, Germany
[2] Univ Tubingen, Dept Comp Sci, Sand 14, D-72070 Tubingen, Germany
[3] Free Univ Berlin, Inst Comp Sci, Algorithm Bioinformat, Takustr 9, D-14195 Berlin, Germany
[4] Univ Notre Dame, Coll Engn, 257 Fitzpatrick Hall, Notre Dame, IN 46556 USA
来源
BMC BIOINFORMATICS | 2016年 / 17卷
关键词
Workflow; Interoperability; KNIME; Grid; Cloud; Galaxy; gUSE; MASS-SPECTROMETRY; PROTEOMICS; FRAMEWORK; TANDEM;
D O I
10.1186/s12859-016-0978-9
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free-an aspect that could potentially drive away members of the scientific community. Results: We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. Conclusions: Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] From the desktop to the grid: scalable bioinformatics via workflow conversion
    Luis de la Garza
    Johannes Veit
    Andras Szolek
    Marc Röttig
    Stephan Aiche
    Sandra Gesing
    Knut Reinert
    Oliver Kohlbacher
    [J]. BMC Bioinformatics, 17
  • [2] Scalable desktop grid system
    Kacsuk, Peter
    Podhorszki, Norbert
    Kiss, Tamas
    [J]. HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2006, 2007, 4395 : 27 - +
  • [3] SZTAKI Desktop Grid (SZDG): A Flexible and Scalable Desktop Grid System
    Kacsuk, Peter
    Kovacs, Jozsef
    Farkas, Zoltan
    Marosi, Attila Csaba
    Gombas, Gabor
    Balaton, Zoltan
    [J]. JOURNAL OF GRID COMPUTING, 2009, 7 (04) : 439 - 461
  • [4] SZTAKI Desktop Grid (SZDG): A Flexible and Scalable Desktop Grid System
    Peter Kacsuk
    Jozsef Kovacs
    Zoltan Farkas
    Attila Csaba Marosi
    Gabor Gombas
    Zoltan Balaton
    [J]. Journal of Grid Computing, 2009, 7
  • [5] Sztaki desktop grid: Building a scalable, secure platform for desktop grid computing
    Marosi, Attila
    Gombas, Gabor
    Balaton, Zoltan
    Kacsuk, Peter
    Kiss, Tamas
    [J]. MAKING GRIDS WORK, 2008, : 365 - +
  • [6] Bio-UnaGrid: Easing Bioinformatics Workflow Execution Using LONI Pipeline and a Virtual Desktop Grid
    Villamizar, Mario
    Castro, Harold
    Mendez, David
    Restrepo, Silvia
    Rodriguez, Luis
    [J]. BIOTECHNO 2011: THE THIRD INTERNATIONAL CONFERENCE ON BIOINFORMATICS, BIOCOMPUTATIONAL SYSTEMS AND BIOTECHNOLOGIES, 2011, : 12 - 19
  • [7] Snakemake-a scalable bioinformatics workflow engine
    Koester, Johannes
    Rahmann, Sven
    [J]. BIOINFORMATICS, 2012, 28 (19) : 2520 - 2522
  • [8] Scalable enterprise level workflow manager for the Grid
    Ra, D
    Ali, S
    Gupta, I
    Dave, H
    Upadhyay, AK
    Alves, LD
    Damodaran, A
    Chakrabarti, A
    Ghosh, A
    [J]. QSIC 2005: FIFTH INTERNATIONAL CONFERENCE ON QUALITY SOFTWARE, PROCEEDINGS, 2005, : 341 - 348
  • [9] Comparative Analysis of Workflow and Performance Characteristics in Cluster and Desktop Grid
    Achary, K. Sudipta
    Mandal, Arkaprava Bhaduri
    Reza, Motahar
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2012, : 175 - 179
  • [10] Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers
    Laura Wratten
    Andreas Wilm
    Jonathan Göke
    [J]. Nature Methods, 2021, 18 : 1161 - 1168