xGAP: a python']python based efficient, modular, extensible and fault tolerant genomic analysis pipeline for variant discovery

被引:0
|
作者
Gorla, Aditya [1 ]
Jew, Brandon [2 ]
Zhang, Luke [3 ]
Sul, Jae Hoon [4 ]
机构
[1] Univ Calif Los Angeles, Dept Bioengn, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Bioinformat Interdept Program, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Undergrad Neurosci Interdept Program, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, Dept Psychiat & Biobehav Sci, Los Angeles, CA 90095 USA
基金
美国国家科学基金会;
关键词
FRAMEWORK; ALIGNMENT; ACCURATE;
D O I
10.1093/bioinformatics/btaa1097
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Since the first human genome was sequenced in 2001, there has been a rapid growth in the number of bioinformatic methods to process and analyze next-generation sequencing (NGS) data for research and clinical studies that aim to identify genetic variants influencing diseases and traits. To achieve this goal, one first needs to call genetic variants from NGS data, which requires multiple computationally intensive analysis steps. Unfortunately, there is a lack of an open-source pipeline that can perform all these steps on NGS data in a manner, which is fully automated, efficient, rapid, scalable, modular, user-friendly and fault tolerant. To address this, we introduce xGAP, an extensible Genome Analysis Pipeline, which implements modified GATK best practice to analyze DNA-seq data with the aforementioned functionalities. Results: xGAP implements massive parallelization of the modified GATK best practice pipeline by splitting a genome into many smaller regions with efficient load-balancing to achieve high scalability. It can process 30x coverage whole-genome sequencing (WGS) data in similar to 90 min. In terms of accuracy of discovered variants, xGAP achieves average F1 scores of 99.37% for single nucleotide variants and 99.20% for insertion/deletions across seven benchmark WGS datasets. We achieve highly consistent results across multiple on-premises (SGE & SLURM) high-performance clusters. Compared to the Churchill pipeline, with similar parallelization, xGAP is 20% faster when analyzing 50x coverage WGS on Amazon Web Service. Finally, xGAP is user-friendly and fault tolerant where it can automatically re-initiate failed processes to minimize required user intervention.
引用
收藏
页码:9 / 16
页数:8
相关论文
共 10 条
  • [1] SNPAAMapper-Python']Python: A highly efficient genome-wide SNP variant analysis pipeline for Next-Generation Sequencing data
    Li, Chang
    Ma, Kevin
    Xu, Nicole
    Fu, Chenjian
    He, Andrew
    Liu, Xiaoming
    Bai, Yongsheng
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 5
  • [2] Machine Learning Identification of Saline-Alkali-Tolerant Japonica Rice Varieties Based on Raman Spectroscopy and Python']Python Visual Analysis
    Liu, Rui
    Tan, Feng
    Wang, Yaxuan
    Ma, Bo
    Yuan, Ming
    Wang, Lianxia
    Zhao, Xin
    AGRICULTURE-BASEL, 2022, 12 (07):
  • [3] A Python']Python-Based Framework for Computationally Efficient Trim and Real-Time Simulation Using Comprehensive Analysis
    Sridharan, Ananth
    Rubenstein, Greg
    Moy, David Michael
    Chopra, Inderjit
    JOURNAL OF THE AMERICAN HELICOPTER SOCIETY, 2018, 63 (01)
  • [4] An efficient and user-friendly software tool for ordered multi-class receiver operating characteristic analysis based on python']python
    Liu, Shun
    Yang, Junjie
    Zeng, Xianxian
    Song, Haiying
    Cen, Jian
    Xu, Weichao
    SOFTWAREX, 2022, 19
  • [5] An Efficient Fault Tolerant Location Based Service Discovery Protocol for Vehicular Networks
    Abrougui, Kaouther
    Boukerche, Azzedine
    Pazzi, Richard Werner Nelem
    2010 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE GLOBECOM 2010, 2010,
  • [6] Efficient and Fault Tolerant Service Discovery in MANET using Quorum-based Selective Replication
    Raychoudhury, Vaskar
    2009 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS (PERCOM), VOLS 1 AND 2, 2009, : 395 - 396
  • [7] An Efficient Fault-Tolerant Valve-Based Microfluidic Routing Fabric for Single-Cell Analysis
    Moradi, Yasamin
    Chakrabarty, Krishnendu
    Schlichtmann, Ulf
    2018 23RD IEEE EUROPEAN TEST SYMPOSIUM (ETS), 2018,
  • [8] Thermal Analysis of Modular Fault-Tolerant Permanent Magnet Motor Based on Electromagnetic-Thermal Bi-Directional Coupling
    Gan, Baoping
    Zhang, Bingyi
    Liu, Yunfei
    Feng, Guihong
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2022, 17 (03) : 454 - 469
  • [9] An Efficient Fault-Tolerant Valve-Based Microfluidic Routing Fabric for Droplet Barcoding in Single-Cell Analysis
    Moradi, Yasamin
    Ibrahim, Mohamed
    Chakrabarty, Krishnendu
    Schlichtmann, Ulf
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (02) : 359 - 372
  • [10] Investigation of Open-Circuit Fault-Tolerant Strategy in a Modular Permanent Magnet Synchronous In-Wheel Motor Based on Electromagnetic-Thermal Analysis
    Tang, Yue
    Chai, Feng
    Chen, Lei
    IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, 2022, 8 (01) : 1085 - 1093