Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers

被引:65
|
作者
Chen, Jiayun [1 ,2 ]
Li, Xingsong [1 ,2 ]
Zhong, Hongbin [1 ,2 ]
Meng, Yuhuan [1 ,2 ]
Du, Hongli [1 ,2 ]
机构
[1] South China Univ Technol, Sch Biol & Biol Engn, Guangzhou, Guangdong, Peoples R China
[2] South China Univ Technol, Dept Biomed Engn, Guangzhou, Guangdong, Peoples R China
基金
中国博士后科学基金;
关键词
GENOME; SETS; MAP;
D O I
10.1038/s41598-019-45835-3
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The development and innovation of next generation sequencing (NGS) and the subsequent analysis tools have gain popularity in scientific researches and clinical diagnostic applications. Hence, a systematic comparison of the sequencing platforms and variant calling pipelines could provide significant guidance to NGS-based scientific and clinical genomics. In this study, we compared the performance, concordance and operating efficiency of 27 combinations of sequencing platforms and variant calling pipelines, testing three variant calling pipelines-Genome Analysis Tool Kit HaplotypeCaller, Strelka2 and Samtools-Varscan2 for nine data sets for the NA12878 genome sequenced by different platforms including BGISEQ500, MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten. For the variants calling performance of 12 combinations in WES datasets, all combinations displayed good performance in calling SNPs, with their F-scores entirely higher than 0.96, and their performance in calling INDELs varies from 0.75 to 0.91. And all 15 combinations in WGS datasets also manifested good performance, with F-scores in calling SNPs were entirely higher than 0.975 and their performance in calling INDELs varies from 0.71 to 0.93. All of these combinations manifested high concordance in variant identification, while the divergence of variants identification in WGS datasets were larger than that in WES datasets. We also down-sampled the original WES and WGS datasets at a series of gradient coverage across multiple platforms, then the variants calling period consumed by the three pipelines at each coverage were counted, respectively. For the GIAB datasets on both BGI and Illumina platforms, Strelka2 manifested its ultra-performance in detecting accuracy and processing efficiency compared with other two pipelines on each sequencing platform, which was recommended in the further promotion and application of next generation sequencing technology. The results of our researches will provide useful and comprehensive guidelines for personal or organizational researchers in reliable and consistent variants identification.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Systematic benchmarking of multiple variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery
    Barbitoff, Yury A.
    Abasov, Ruslan
    Tvorogova, Varvara E.
    Glotov, Andrey S.
    Predeus, Alexander V.
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2022, 30 (SUPPL 1) : 505 - 505
  • [32] Prognostic implications of MRD assessment in multiple myeloma patients: comparison of Next-Generation Sequencing and Next-Generation Flow
    Medina, Alejandro
    Jimenez, Cristina
    Puig, Noemi
    Flores-Montero, Juan
    Paiva, Bruno
    Eugenia Sarasquete, M.
    Prieto-Conde, Isabel
    Garcia-Alvarez, Maria
    Chillon, Carmen
    Alcoceba, Miguel
    Gonzalez-Calle, Veronica
    Gutierrez, Norma C.
    de Arriba, Felipe
    Hernandez, Miguel T.
    Blade, Joan
    Martinez-Lopez, Joaquin
    Calasanz, Maria-Jose
    Lahuerta, Juan-Jose
    Mateos, Maria-Victoria
    San-Miguel, Jesus
    Gonzalez, Marcos
    Garcia-Sanz, Ramon
    [J]. CLINICAL LYMPHOMA MYELOMA & LEUKEMIA, 2019, 19 (10): : E47 - E47
  • [33] A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference
    Cornish, Adam
    Guda, Chittibabu
    [J]. BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [34] A systematic comparison of error correction enzymes by next-generation sequencing
    Lubock, Nathan B.
    Zhang, Di
    Sidore, Angus M.
    Church, George M.
    Kosuri, Sriram
    [J]. NUCLEIC ACIDS RESEARCH, 2017, 45 (15) : 9206 - 9217
  • [35] High-Throughput Selection and Characterisation of Aptamers on Optical Next-Generation Sequencers
    Drees, Alissa
    Fischer, Markus
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (17)
  • [36] A Singularity Container for Molecular Diagnostic Somatic Variant Calling on the Ion Torrent Next-Generation Sequencing Platform
    Garofoli, Andrea
    Paradiso, Viola
    Montazeri, Hesam
    Jermann, Philip M.
    Roma, Guglielmo
    Tornillo, Luigi
    Terracciano, Luigi M.
    Piscuoglio, Salvatore
    Ng, Charlotte K. Y.
    [J]. JOURNAL OF MOLECULAR DIAGNOSTICS, 2019, 21 (05): : 884 - 894
  • [37] Germline Mutation Screening: Conventional or Next-Generation Sequencing?
    Kwong, Ava
    Au, Tommy
    Law, Fian
    Ho, Dona
    Ip, Beca
    Wong, Anthony
    Shin, Vivian
    Chan, Chris
    Ma, Edmond
    [J]. ANNALS OF SURGICAL ONCOLOGY, 2014, 21 : 71 - 72
  • [38] Development of a Novel Score Based System for Germline Variant Interpretation of Clinical Next-Generation Sequencing Data
    Powers, M. P.
    Anderson, M.
    Garcia, J.
    Nykamp, K.
    Monzon, F.
    Topper, S.
    [J]. JOURNAL OF MOLECULAR DIAGNOSTICS, 2014, 16 (06): : 705 - 705
  • [39] Genotype and SNP calling from next-generation sequencing data
    Rasmus Nielsen
    Joshua S. Paul
    Anders Albrechtsen
    Yun S. Song
    [J]. Nature Reviews Genetics, 2011, 12 : 443 - 451
  • [40] Genotype and SNP calling from next-generation sequencing data
    Nielsen, Rasmus
    Paul, Joshua S.
    Albrechtsen, Anders
    Song, Yun S.
    [J]. NATURE REVIEWS GENETICS, 2011, 12 (06) : 443 - 451