Analysis of error profiles in deep next-generation sequencing data

被引:156
|
作者
Ma, Xiaotu [1 ]
Shao, Ying [1 ]
Tian, Liqing [1 ]
Flasch, Diane A. [1 ]
Mulder, Heather L. [1 ]
Edmonson, Michael N. [1 ]
Liu, Yu [1 ]
Chen, Xiang [1 ]
Newman, Scott [1 ]
Nakitandwe, Joy [2 ]
Li, Yongjin [1 ]
Li, Benshang [3 ]
Shen, Shuhong [3 ]
Wang, Zhaoming [1 ,4 ]
Shurtleff, Sheila [2 ]
Robison, Leslie L. [4 ]
Levy, Shawn [5 ]
Easton, John [1 ]
Zhang, Jinghui [1 ]
机构
[1] St Jude Childrens Res Hosp, Dept Computat Biol, 332 N Lauderdale St, Memphis, TN 38105 USA
[2] St Jude Childrens Res Hosp, Dept Pathol, 332 N Lauderdale St, Memphis, TN 38105 USA
[3] Shanghai Jiao Tong Univ, Shanghai Childrens Med Ctr, Key Lab Pediat Hematol & Oncol, Minist Hlth,Dept Hematol & Oncol,Sch Med, Shanghai 200127, Peoples R China
[4] St Jude Childrens Res Hosp, Dept Epidemiol & Canc Control, 332 N Lauderdale St, Memphis, TN 38105 USA
[5] HudsonAlpha Inst Biotechnol, Huntsville, AL 35806 USA
关键词
Deep sequencing; Error rate; Substitution; Subclonal; Detection; Hotspot mutation; CLONAL HEMATOPOIESIS; MUTATIONAL PROCESSES; DNA; RISK; SIGNATURES; LANDSCAPE; GENOME; GENES; AGE;
D O I
10.1186/s13059-019-1659-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundSequencing errors are key confounding factors for detecting low-frequency genetic variants that are important for cancer molecular diagnosis, treatment, and surveillance using deep next-generation sequencing (NGS). However, there is a lack of comprehensive understanding of errors introduced at various steps of a conventional NGS workflow, such as sample handling, library preparation, PCR enrichment, and sequencing. In this study, we use current NGS technology to systematically investigate these questions.ResultsBy evaluating read-specific error distributions, we discover that the substitution error rate can be computationally suppressed to 10(-5) to 10(-4), which is 10- to 100-fold lower than generally considered achievable (10(-3)) in the current literature. We then quantify substitution errors attributable to sample handling, library preparation, enrichment PCR, and sequencing by using multiple deep sequencing datasets. We find that error rates differ by nucleotide substitution types, ranging from 10(-5) for A>C/T>G, C>A/G>T, and C>G/G>C changes to 10(-4) for A>G/T>C changes. Furthermore, C>T/G>A errors exhibit strong sequence context dependency, sample-specific effects dominate elevated C>A/G>T errors, and target-enrichment PCR led to 6-fold increase of overall error rate. We also find that more than 70% of hotspot variants can be detected at 0.10.01% frequency with the current NGS technology by applying in silico error suppression.ConclusionsWe present the first comprehensive analysis of sequencing error sources in conventional NGS workflows. The error profiles revealed by our study highlight new directions for further improving NGS analysis accuracy both experimentally and computationally, ultimately enhancing the precision of deep sequencing.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Analysis of error profiles in deep next-generation sequencing data
    Ma, Xiaotu
    Zhang, Jinghui
    [J]. CANCER RESEARCH, 2019, 79 (13)
  • [2] Analysis of error profiles in deep next-generation sequencing data
    Xiaotu Ma
    Ying Shao
    Liqing Tian
    Diane A. Flasch
    Heather L. Mulder
    Michael N. Edmonson
    Yu Liu
    Xiang Chen
    Scott Newman
    Joy Nakitandwe
    Yongjin Li
    Benshang Li
    Shuhong Shen
    Zhaoming Wang
    Sheila Shurtleff
    Leslie L. Robison
    Shawn Levy
    John Easton
    Jinghui Zhang
    [J]. Genome Biology, 20
  • [3] Analysis of indel and structural variant error profiles in deep next generation sequencing data
    Shao, Ying
    Tran, Quang
    Kolekar, Pandurang
    Liu, Yanling
    McBride, Andrea
    Jones, Tyler
    Mulder, Heather
    Ji, Lingyun
    Huang, Benjamin
    Meshinchi, Soheil
    Klco, Jeffery
    Zhang, Jinghui
    Carroll, William
    Loh, Mignon
    Brown, Patrick
    Easton, John
    Ma, Xiaotu
    [J]. CANCER RESEARCH, 2023, 83 (08)
  • [4] Focus on next-generation sequencing data analysis
    Rusk N.
    [J]. Nature Methods, 2009, 6 (Suppl 11) : S1 - S1
  • [5] Pathway analysis with next-generation sequencing data
    Jinying Zhao
    Yun Zhu
    Eric Boerwinkle
    Momiao Xiong
    [J]. European Journal of Human Genetics, 2015, 23 : 507 - 515
  • [6] Applications and data analysis of next-generation sequencing
    Vogl, Ina
    Benet-Pages, Anna
    Eck, Sebastian H.
    Kuhn, Marius
    Vosberg, Sebastian
    Greif, Philipp A.
    Metzeler, Klaus H.
    Biskup, Saskia
    Mueller-Reible, Clemens
    Klein, Hanns-Georg
    [J]. LABORATORIUMSMEDIZIN-JOURNAL OF LABORATORY MEDICINE, 2013, 37 (06): : 305 - 315
  • [7] Pathway analysis with next-generation sequencing data
    Zhao, Jinying
    Zhu, Yun
    Boerwinkle, Eric
    Xiong, Momiao
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2015, 23 (04) : 507 - 515
  • [8] MapReduce for accurate error correction of next-generation sequencing data
    Zhao, Liang
    Chen, Qingfeng
    Li, Wencui
    Jiang, Peng
    Wong, Limsoon
    Li, Jinyan
    [J]. BIOINFORMATICS, 2017, 33 (23) : 3844 - 3851
  • [9] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    [J]. BIOINFORMATICS, 2023, 39 (01)
  • [10] Factorial Analysis of Error Correction Performance Using Simulated Next-Generation Sequencing Data
    Akogwu, Isaac
    Wang, Nan
    Zhang, Chaoyang
    Hong, Huixiao
    Choi, Hwanseok
    Gong, Ping
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1164 - 1169