xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments

被引:3
|
作者
Farek, Jesse [1 ]
Hughes, Daniel [1 ,2 ]
Salerno, William [1 ,3 ]
Zhu, Yiming [1 ]
Pisupati, Aishwarya [1 ]
Mansfield, Adam [1 ,3 ]
Krasheninina, Olga [1 ,3 ]
English, Adam C. [1 ]
Metcalf, Ginger [1 ]
Boerwinkle, Eric [1 ,4 ]
Muzny, Donna M. [1 ]
Gibbs, Richard [1 ]
Khan, Ziad [1 ]
Sedlazeck, Fritz J. [1 ]
机构
[1] Baylor Coll Med, Human Genome Sequencing Ctr, One Baylor Plaza, Houston, TX 77030 USA
[2] Columbia Univ, Inst Genom Med, New York, NY USA
[3] Regeneron Pharmaceut Inc, Tarrytown, NY USA
[4] Univ Texas Hlth Sci Ctr Houston, Human Genet Ctr, El Paso, TX USA
来源
GIGASCIENCE | 2023年 / 12卷
关键词
GENOTYPE; GENOMES; SNP;
D O I
10.1093/gigascience/giac125
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The growing volume and heterogeneity of next-generation sequencing (NGS) data complicate the further optimization of identifying DNA variation, especially considering that curated high-confidence variant call sets frequently used to validate these methods are generally developed from the analysis of comparatively small and homogeneous sample sets. Findings: We have developed xAtlas, a single-sample variant caller for single-nucleotide variants (SNVs) and small insertions and deletions (indels) in NGS data. xAtlas features rapid runtimes, support for CRAM and gVCF file formats, and retraining capabilities. xAtlas reports SNVs with 99.11% recall and 98.43% precision across a reference HG002 sample at 60x whole-genome coverage in less than 2 CPU hours. Applying xAtlas to 3,202 samples at 30x whole-genome coverage from the 1000 Genomes Project achieves an average runtime of 1.7 hours per sample and a clear separation of the individual populations in principal component analysis across called SNVs. Conclusions: xAtlas is a fast, lightweight, and accurate SNV and small indel calling method. Source code for xAtlas is available under a BSD 3-clause license at https://github.com/jfarek/xatlas.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Two-Stage Variant Calling Algorithm for Next-Generation Sequencing Experiments
    Germanas, Sarunas
    [J]. HUMAN HEREDITY, 2016, 81 (04) : 234 - 234
  • [2] Validation and assessment of variant calling pipelines for next-generation sequencing
    Pirooznia, Mehdi
    Kramer, Melissa
    Parla, Jennifer
    Goes, Fernando S.
    Potash, James B.
    McCombie, W. Richard
    Zandi, Peter P.
    [J]. HUMAN GENOMICS, 2014, 8 : 14
  • [3] Validation and assessment of variant calling pipelines for next-generation sequencing
    Mehdi Pirooznia
    Melissa Kramer
    Jennifer Parla
    Fernando S Goes
    James B Potash
    W Richard McCombie
    Peter P Zandi
    [J]. Human Genomics, 8
  • [4] Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
    Sarah Sandmann
    Aniek O. de Graaf
    Mohsen Karimi
    Bert A. van der Reijden
    Eva Hellström-Lindberg
    Joop H. Jansen
    Martin Dugas
    [J]. Scientific Reports, 7
  • [5] Empirical Bayes single nucleotide variant-calling for next-generation sequencing data
    Karimnezhad, Ali
    Perkins, Theodore J.
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01)
  • [6] A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data
    Xu, Chang
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2018, 16 : 15 - 24
  • [7] Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
    Sandmann, Sarah
    de Graaf, Aniek O.
    Karimi, Mohsen
    van der Reijden, Bert A.
    Hellstrom-Lindberg, Eva
    Jansen, Joop H.
    Dugas, Martin
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [8] Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data
    Kosugi, Shunichi
    Natsume, Satoshi
    Yoshida, Kentaro
    MacLean, Daniel
    Cano, Liliana
    Kamoun, Sophien
    Terauchi, Ryohei
    [J]. PLOS ONE, 2013, 8 (10):
  • [9] Empirical Bayes single nucleotide variant-calling for next-generation sequencing data
    Ali Karimnezhad
    Theodore J. Perkins
    [J]. Scientific Reports, 14
  • [10] Base-calling for next-generation sequencing platforms
    Ledergerber, Christian
    Dessimoz, Christophe
    [J]. BRIEFINGS IN BIOINFORMATICS, 2011, 12 (05) : 489 - 497