A study on fast calling variants from next-generation sequencing data using decision tree

被引:9
|
作者
Li, Zhentang [1 ,2 ]
Wang, Yi [3 ,4 ,5 ]
Wang, Fei [1 ,2 ]
机构
[1] Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[2] Fudan Univ, Sch Comp Sci & Technol, Shanghai, Peoples R China
[3] Fudan Univ, MOE Key Lab Contemporary Anthropol, Shanghai 200438, Peoples R China
[4] Fudan Univ, Collaborat Innovat Ctr Genet & Dev Biol, State Key Lab Genet Engn, Shanghai 200438, Peoples R China
[5] Fudan Univ, Sch Life Sci, Shanghai 200438, Peoples R China
来源
BMC BIOINFORMATICS | 2018年 / 19卷
基金
中国国家自然科学基金;
关键词
Next-generation sequencing; Variant calling; Decision tree; FRAMEWORK; FORMAT;
D O I
10.1186/s12859-018-2147-9
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low coverage, remains challenging. Results: We proposed a decision-tree based variant calling algorithm. Experiments on a set of real data indicate that our algorithm achieves high accuracy and sensitivity for SNVs and indels and shows good adaptability on low-coverage data. In particular, our algorithm is obviously faster than 3 widely used tools in our experiments. Conclusions: We implemented our algorithm in a software named Fuwa and applied it together with 4 well-known variant callers, i.e., Platypus, GATK-UnifiedGenotyper, GATK-HaplotypeCaller and SAMtools, to three sequencing data sets of a well-studied sample NA12878, which were produced by whole-genome, whole-exome and low-coverage whole-genome sequencing technology respectively. We also conducted additional experiments on the WGS data of 4 newly released samples that have not been used to populate dbSNP.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Empirical Bayes single nucleotide variant-calling for next-generation sequencing data
    Ali Karimnezhad
    Theodore J. Perkins
    Scientific Reports, 14
  • [32] Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data
    Kosugi, Shunichi
    Natsume, Satoshi
    Yoshida, Kentaro
    MacLean, Daniel
    Cano, Liliana
    Kamoun, Sophien
    Terauchi, Ryohei
    PLOS ONE, 2013, 8 (10):
  • [33] ASEQ: fast allele-specific studies from next-generation sequencing data
    Romanel, Alessandro
    Lago, Sara
    Prandi, Davide
    Sboner, Andrea
    Demichelis, Francesca
    BMC MEDICAL GENOMICS, 2015, 8
  • [34] Validation and assessment of variant calling pipelines for next-generation sequencing
    Pirooznia, Mehdi
    Kramer, Melissa
    Parla, Jennifer
    Goes, Fernando S.
    Potash, James B.
    McCombie, W. Richard
    Zandi, Peter P.
    HUMAN GENOMICS, 2014, 8 : 14
  • [35] Validation and assessment of variant calling pipelines for next-generation sequencing
    Mehdi Pirooznia
    Melissa Kramer
    Jennifer Parla
    Fernando S Goes
    James B Potash
    W Richard McCombie
    Peter P Zandi
    Human Genomics, 8
  • [36] Using Machine Learning to Identify True Somatic Variants from Next-Generation Sequencing
    Wu, Chao
    Zhao, Xiaonan
    Welsh, Mark
    Costello, Kellianne
    Cao, Kajia
    Abou Tayoun, Ahmad
    Li, Marilyn
    Sarmady, Mahdi
    CLINICAL CHEMISTRY, 2020, 66 (01) : 239 - 246
  • [37] BASE CALLING ERROR RATES IN NEXT-GENERATION DNA SEQUENCING
    Shamaiah, Manohar
    Vikalo, Haris
    2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 692 - 695
  • [38] A fast and accurate SNP detection algorithm for next-generation sequencing data
    Xu, Feng
    Wang, Weixin
    Wang, Panwen
    Li, Mulin Jun
    Sham, Pak Chung
    Wang, Junwen
    NATURE COMMUNICATIONS, 2012, 3
  • [39] A fast and accurate SNP detection algorithm for next-generation sequencing data
    Feng Xu
    Weixin Wang
    Panwen Wang
    Mulin Jun Li
    Pak Chung Sham
    Junwen Wang
    Nature Communications, 3
  • [40] Proficiency of Cytologic Smears for Interrogation of Variants Using Next-Generation Sequencing
    Baum, J.
    Hoda, R.
    Geraghty, B.
    Zhang, P.
    Fernandes, H.
    JOURNAL OF MOLECULAR DIAGNOSTICS, 2016, 18 (06): : 996 - 996