Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility

被引:0
|
作者
Srivastava, Arunima [1 ]
Adusumilli, Ravali [2 ]
Boyce, Hunter [2 ]
Garijo, Daniel [3 ]
Ratnakar, Varun [3 ]
Mayani, Rajiv [3 ]
Yu, Thomas [4 ]
Machiraju, Raghu [1 ]
Gil, Yolanda [3 ]
Mallick, Parag [2 ]
机构
[1] Ohio State Univ, Comp Sci & Engn, 2015 Neil Ave, Columbus, OH 43210 USA
[2] Stanford Univ, Canary Ctr Canc Early Detect, 3155 Porter Dr, Palo Alto, CA 94305 USA
[3] Univ Southern Calif, Informat Sci Inst, Los Angeles, CA 90292 USA
[4] Sage Bionetworks, 2901 Third Ave,Suite 330, Seattle, WA 98121 USA
关键词
Workflows; Semantic Workflows; DREAM Challenges; Proteogenomics; Benchmarking; Big; EXPRESSION ANALYSIS; GENE;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Benchmark challenges, such as the Critical Assessment of Structure Prediction (CASP) and Dialogue for Reverse Engineering Assessments and Methods (DREAM) have been instrumental in driving the development of bioinformatics methods. Typically, challenges are posted, and then competitors perform a prediction based upon blinded test data. Challengers then submit their answers to a central server where they are scored. Recent efforts to automate these challenges have been enabled by systems in which challengers submit Docker containers, a unit of software that packages up code and all of its dependencies, to be run on the cloud. Despite their incredible value for providing an unbiased test-bed for the bioinformatics community, there remain opportunities to further enhance the potential impact of benchmark challenges. Specifically, current approaches only evaluate end-to-end performance; it is nearly impossible to directly compare methodologies or parameters. Furthermore, the scientific community cannot easily reuse challengers' approaches, due to lack of specifics, ambiguity in tools and parameters as well as problems in sharing and maintenance. Lastly, the intuition behind why particular steps are used is not captured, as the proposed workflows are not explicitly defined, making it cumbersome to understand the flow and utilization of data. Here we introduce an approach to overcome these limitations based upon the WINGS semantic workflow system. Specifically, WINGS enables researchers to submit complete semantic workflows as challenge submissions. By submitting entries as workflows, it then becomes possible to compare not just the results and performance of a challenger, but also the methodology employed. This is particularly important when dozens of challenge entries may use nearly identical tools, but with only subtle changes in parameters ( and radical differences in results). WINGS uses a component driven workflow design and offers intelligent parameter and data selection by reasoning about data characteristics. This proves to be especially critical in bioinformatics workflows where using default or incorrect parameter values is prone to drastically altering results. Different challenge entries may be readily compared through the use of abstract workflows, which also facilitate reuse. WINGS is housed on a cloud based setup, which stores data, dependencies and workflows for easy sharing and utility. It also has the ability to scale workflow executions using distributed computing through the Pegasus workflow execution system. We demonstrate the application of this architecture to the DREAM proteogenomic challenge.
引用
收藏
页码:208 / 219
页数:12
相关论文
共 19 条
  • [1] Dealing with Reusability and Reproducibility for Scientific Workflows
    Lifschitz, Sergio
    Gomes, Luciana
    Rehen, Stevens K.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, 2011, : 625 - 632
  • [2] Reusability Challenges of Scientific Workflows: A Case Study for Galaxy
    Alam, Khairul
    Roy, Banani
    Serebrenik, Alexander
    [J]. PROCEEDINGS OF THE 2023 30TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, APSEC 2023, 2023, : 289 - 298
  • [3] LEARNING OBJECT SEMANTIC DESCRIPTION FOR ENHANCING REUSABILITY
    Senas, Perla
    Moroni, Norma
    [J]. JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2005, 5 (04): : 320 - 326
  • [4] Use of semantic workflows to enhance transparency and reproducibility in clinical omics
    Christina L. Zheng
    Varun Ratnakar
    Yolanda Gil
    Shannon K. McWeeney
    [J]. Genome Medicine, 7
  • [5] Use of semantic workflows to enhance transparency and reproducibility in clinical omics
    Zheng, Christina L.
    Ratnakar, Varun
    Gil, Yolanda
    McWeeney, Shannon K.
    [J]. GENOME MEDICINE, 2015, 7
  • [6] Enhancing requirements reusability through semantic modeling and data mining techniques
    Diamantopoulos, Themistoklis
    Symeonidis, Andreas
    [J]. ENTERPRISE INFORMATION SYSTEMS, 2018, 12 (8-9) : 960 - 981
  • [7] Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities
    Cohen-Boulakia, Sarah
    Belhajjame, Khalid
    Collin, Olivier
    Chopard, Jerome
    Froidevaux, Christine
    Gaignard, Alban
    Hinsen, Konrad
    Larmande, Pierre
    Le Brass, Yvan
    Lemoine, Frederic
    Mareuil, Fabien
    Menager, Herve
    Pradal, Christophe
    Blanchet, Christophe
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 75 : 284 - 298
  • [8] Enhancing reproducibility: Failures from Reproducibility Initiatives underline core challenges
    Mullane, Kevin
    Williams, Michael
    [J]. BIOCHEMICAL PHARMACOLOGY, 2017, 138 : 7 - 18
  • [9] International standardization and metrology as tools to address the comparability and reproducibility challenges in XPS measurements
    Unger, Wolfgang E. S.
    [J]. JOURNAL OF VACUUM SCIENCE & TECHNOLOGY A, 2020, 38 (02):
  • [10] Reproducibility in Computer Vision: Towards Open Publication of Image Analysis Experiments as Semantic Workflows
    Sethi, Ricky J.
    Gil, Yolanda
    [J]. PROCEEDINGS OF THE 2016 IEEE 12TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), 2016, : 343 - 348