Variant Callers for Next-Generation Sequencing Data: A Comparison Study

被引:112
|
作者
Liu, Xiangtao [1 ,2 ]
Han, Shizhong [1 ,2 ]
Wang, Zuoheng [3 ]
Gelernter, Joel [1 ,2 ,4 ,5 ]
Yang, Bao-Zhu [1 ,2 ]
机构
[1] Yale Univ, Sch Med, Dept Psychiat, Div Human Genet, New Haven, CT 06520 USA
[2] VA CT Hlth Care Ctr, West Haven, CT USA
[3] Yale Univ, Sch Publ Hlth, Dept Biostat, New Haven, CT USA
[4] Yale Univ, Sch Med, Dept Genet, New Haven, CT 06510 USA
[5] Yale Univ, Sch Med, Dept Neurobiol, New Haven, CT USA
来源
PLOS ONE | 2013年 / 8卷 / 09期
基金
美国国家卫生研究院;
关键词
MAPREDUCE; FRAMEWORK; GENOTYPE; FORMAT;
D O I
10.1371/journal.pone.0075619
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Next generation sequencing (NGS) has been leading the genetic study of human disease into an era of unprecedented productivity. Many bioinformatics pipelines have been developed to call variants from NGS data. The performance of these pipelines depends crucially on the variant caller used and on the calling strategies implemented. We studied the performance of four prevailing callers, SAMtools, GATK, glftools and Atlas2, using single-sample and multiple-sample variant-calling strategies. Using the same aligner, BWA, we built four single-sample and three multiple-sample calling pipelines and applied the pipelines to whole exome sequencing data taken from 20 individuals. We obtained genotypes generated by Illumina Infinium HumanExome v1.1 Beadchip for validation analysis and then used Sanger sequencing as a "gold-standard" method to resolve discrepancies for selected regions of high discordance. Finally, we compared the sensitivity of three of the single-sample calling pipelines using known simulated whole genome sequence data as a gold standard. Overall, for single-sample calling, the called variants were highly consistent across callers and the pairwise overlapping rate was about 0.9. Compared with other callers, GATK had the highest rediscovery rate (0.9969) and specificity (0.99996), and the Ti/Tv ratio out of GATK was closest to the expected value of 3.02. Multiple-sample calling increased the sensitivity. Results from the simulated data suggested that GATK outperformed SAMtools and glfSingle in sensitivity, especially for low coverage data. Further, for the selected discrepant regions evaluated by Sanger sequencing, variant genotypes called by exome sequencing versus the exome array were more accurate, although the average variant sensitivity and overall genotype consistency rate were as high as 95.87% and 99.82%, respectively. In conclusion, GATK showed several advantages over other variant callers for general purpose NGS analyses. The GATK pipelines we developed perform very well.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Benchmarking variant callers in next-generation and third-generation sequencing analysis
    Pei, Surui
    Liu, Tao
    Ren, Xue
    Li, Weizhong
    Chen, Chongjian
    Xie, Zhi
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [2] Evaluation of haplotype callers for next-generation sequencing of viruses
    Eliseev, Anton
    Gibson, Keylie M.
    Avdeyev, Pavel
    Novik, Dmitry
    Bendall, Matthew L.
    Perez-Losada, Marcos
    Alexeev, Nikita
    Crandall, Keith A.
    INFECTION GENETICS AND EVOLUTION, 2020, 82
  • [3] In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data
    Cai, Lei
    Yuan, Wei
    Zhang, Zhou
    He, Lin
    Chou, Kuo-Chen
    SCIENTIFIC REPORTS, 2016, 6
  • [4] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    BIOINFORMATICS, 2023, 39 (01)
  • [5] In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data
    Lei Cai
    Wei Yuan
    Zhou Zhang
    Lin He
    Kuo-Chen Chou
    Scientific Reports, 6
  • [6] Comparison of Two Variant Analysis Programs for Next-Generation Sequencing Data of Whole Mitochondrial Genome
    Lee, Seung Eun
    Kim, Ga Eun
    Kim, Hajin
    Chung, Doo Hyun
    Lee, Soong Deok
    Kim, Moon-Young
    JOURNAL OF KOREAN MEDICAL SCIENCE, 2023, 38 (36) : 1 - 13
  • [7] SNVerGUI: a desktop tool for variant analysis of next-generation sequencing data
    Wang, Wei
    Hu, Weicheng
    Hou, Fang
    Hu, Pingzhao
    Wei, Zhi
    JOURNAL OF MEDICAL GENETICS, 2012, 49 (12) : 753 - 755
  • [8] Consensus Rules in Variant Detection from Next-Generation Sequencing Data
    Jia, Peilin
    Li, Fei
    Xia, Jufeng
    Chen, Haiquan
    Ji, Hongbin
    Pao, William
    Zhao, Zhongming
    PLOS ONE, 2012, 7 (06):
  • [9] A survey of tools for variant analysis of next-generation genome sequencing data
    Pabinger, Stephan
    Dander, Andreas
    Fischer, Maria
    Snajder, Rene
    Sperk, Michael
    Efremova, Mirjana
    Krabichler, Birgit
    Speicher, Michael R.
    Zschocke, Johannes
    Trajanoski, Zlatko
    BRIEFINGS IN BIOINFORMATICS, 2014, 15 (02) : 256 - 278
  • [10] A comparison of tools for the simulation of genomic next-generation sequencing data
    Merly Escalona
    Sara Rocha
    David Posada
    Nature Reviews Genetics, 2016, 17 : 459 - 469