Benchmarking variant callers in next-generation and third-generation sequencing analysis

被引:49
|
作者
Pei, Surui [1 ,2 ]
Liu, Tao [2 ]
Ren, Xue [2 ]
Li, Weizhong [3 ]
Chen, Chongjian [2 ]
Xie, Zhi [4 ]
机构
[1] Sun Yat Sen Univ, Zhongshan Ophthalm Ctr, Guangzhou, Peoples R China
[2] Annoroad Gene Technol Beijing Co Ltd, Beijing 100176, Peoples R China
[3] Sun Yat Sen Univ, Zhongshan Sch Med, Guangzhou, Peoples R China
[4] Sun Yat Sen Univ, Zhongshan Ophthalm Ctr, Bioinformat, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
variant callers; germline variant; somatic variant;
D O I
10.1093/bib/bbaa148
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
DNA variants represent an important source of genetic variations among individuals. Next- generation sequencing (NGS) is the most popular technology for genome-wide variant calling. Third-generation sequencing (TGS) has also recently been used in genetic studies. Although many variant callers are available, no single caller can call both types of variants on NGS or TGS data with high sensitivity and specificity. In this study, we systematically evaluated 11 variant callers on 12 NGS and TGS datasets. For germline variant calling, we tested DNAseq and DNAscope modes from Sentieon, HaplotypeCaller mode from GATK and WGS mode from DeepVariant. All the four callers had comparable performance on NGS data and 30x coverage of WGS data was recommended. For germline variant calling on TGS data, we tested DNAseq mode from Sentieon, HaplotypeCaller mode from GATK and PACBIO mode from DeepVariant. All the three callers had similar performance in SNP calling, while DeepVariant outperformed the others in InDel calling. TGS detected more variants than NGS, particularly in complex and repetitive regions. For somatic variant calling on NGS, we tested TNscope and TNseq modes from Sentieon, MuTect2 mode from GATK, NeuSomatic, VarScan2, and Strelka2. TNscope and Mutect2 outperformed the other callers. A higher proportion of tumor sample purity (from 10 to 20%) significantly increased the recall value of calling. Finally, computational costs of the callers were compared and Sentieon required the least computational cost. These results suggest that careful selection of a tool and parameters is needed for accurate SNP or InDel calling under different scenarios.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Next-Generation Sequencing Data Analysis
    Chowdhry, Amit K.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2024,
  • [22] Cheap third-generation sequencing
    Nicole Rusk
    Nature Methods, 2009, 6 : 244 - 244
  • [23] APPLICATIONS OF NEXT-GENERATION SEQUENCING Sequencing technologies - the next generation
    Metzker, Michael L.
    NATURE REVIEWS GENETICS, 2010, 11 (01) : 31 - 46
  • [24] Next-generation sequencing
    Haferlach, T.
    ONCOLOGY RESEARCH AND TREATMENT, 2016, 39 : 40 - 41
  • [25] Next-Generation Sequencing
    Xiong, Momiao
    Zhao, Zhongming
    Arnold, Jonathan
    Yu, Fuli
    JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2010,
  • [26] Next-generation sequencing
    Jorge S Reis-Filho
    Breast Cancer Research, 11
  • [27] Next-Generation Sequencing
    Le Gallo, Matthieu
    Lozy, Fred
    Bell, Daphne W.
    MOLECULAR GENETICS OF ENDOMETRIAL CARCINOMA, 2017, 943 : 119 - 148
  • [28] Next-generation sequencing
    Reis-Filho, Jorge S.
    BREAST CANCER RESEARCH, 2009, 11
  • [30] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    BIOINFORMATICS, 2023, 39 (01)