Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data

被引:141
|
作者
Sandmann, Sarah [1 ]
de Graaf, Aniek O. [2 ]
Karimi, Mohsen [3 ]
van der Reijden, Bert A. [2 ]
Hellstrom-Lindberg, Eva [3 ]
Jansen, Joop H. [2 ]
Dugas, Martin [1 ]
机构
[1] Univ Munster, Inst Med Informat, D-48149 Munster, Germany
[2] RadboudUMC, Lab Hematol, NL-6525 Nijmegen, Netherlands
[3] Karolinska Inst, Dept Med Huddinge, Ctr Hematol & Regenerat Med, S-14186 Stockholm, Sweden
来源
SCIENTIFIC REPORTS | 2017年 / 7卷
关键词
GENOME; DISCOVERY; MUTATION; CANCER;
D O I
10.1038/srep43169
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages-and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
    Sarah Sandmann
    Aniek O. de Graaf
    Mohsen Karimi
    Bert A. van der Reijden
    Eva Hellström-Lindberg
    Joop H. Jansen
    Martin Dugas
    [J]. Scientific Reports, 7
  • [2] A survey of tools for variant analysis of next-generation genome sequencing data
    Pabinger, Stephan
    Dander, Andreas
    Fischer, Maria
    Snajder, Rene
    Sperk, Michael
    Efremova, Mirjana
    Krabichler, Birgit
    Speicher, Michael R.
    Zschocke, Johannes
    Trajanoski, Zlatko
    [J]. BRIEFINGS IN BIOINFORMATICS, 2014, 15 (02) : 256 - 278
  • [3] Validation and assessment of variant calling pipelines for next-generation sequencing
    Pirooznia, Mehdi
    Kramer, Melissa
    Parla, Jennifer
    Goes, Fernando S.
    Potash, James B.
    McCombie, W. Richard
    Zandi, Peter P.
    [J]. HUMAN GENOMICS, 2014, 8 : 14
  • [4] Validation and assessment of variant calling pipelines for next-generation sequencing
    Mehdi Pirooznia
    Melissa Kramer
    Jennifer Parla
    Fernando S Goes
    James B Potash
    W Richard McCombie
    Peter P Zandi
    [J]. Human Genomics, 8
  • [5] Empirical Bayes single nucleotide variant-calling for next-generation sequencing data
    Karimnezhad, Ali
    Perkins, Theodore J.
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01)
  • [6] A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data
    Xu, Chang
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2018, 16 : 15 - 24
  • [7] Empirical Bayes single nucleotide variant-calling for next-generation sequencing data
    Ali Karimnezhad
    Theodore J. Perkins
    [J]. Scientific Reports, 14
  • [8] Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data
    Kosugi, Shunichi
    Natsume, Satoshi
    Yoshida, Kentaro
    MacLean, Daniel
    Cano, Liliana
    Kamoun, Sophien
    Terauchi, Ryohei
    [J]. PLOS ONE, 2013, 8 (10):
  • [9] SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data
    Wei, Zhi
    Wang, Wei
    Hu, Pingzhao
    Lyon, Gholson J.
    Hakonarson, Hakon
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 (19)
  • [10] Genotype and SNP calling from next-generation sequencing data
    Rasmus Nielsen
    Joshua S. Paul
    Anders Albrechtsen
    Yun S. Song
    [J]. Nature Reviews Genetics, 2011, 12 : 443 - 451