Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies

被引:150
|
作者
Zagordi, Osvaldo [1 ]
Klein, Rolf [2 ]
Daeumer, Martin [2 ]
Beerenwinkel, Niko [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, CH-4058 Basel, Switzerland
[2] Inst Immunol & Genet, D-67655 Kaiserslautern, Germany
基金
瑞士国家科学基金会;
关键词
DRUG-RESISTANCE MUTATIONS; MOLECULAR-BIOLOGY; NAIVE PATIENTS; POPULATIONS; GENO2PHENO; DIVERSITY; GENOME;
D O I
10.1093/nar/gkq655
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencing data obtained from a 1.5-kb-fragment of the HIV-1 gag/pol gene in two control and two clinical samples. The effect of PCR amplification was analysed. Error correction resulted in a two- and five-fold decrease of the pyrosequencing base substitution rate, from 0.05% to 0.03% and from 0.25% to 0.05% in the non-PCR and PCR-amplified samples, respectively. We were able to detect viral clones as rare as 0.1% with perfect sequence reconstruction. Probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall. Genetic diversity observed within and between two clinical samples resulted in various patterns of phenotypic drug resistance and suggests a close epidemiological link. We conclude that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treated.
引用
收藏
页码:7400 / 7409
页数:10
相关论文
共 50 条
  • [1] MapReduce for accurate error correction of next-generation sequencing data
    Zhao, Liang
    Chen, Qingfeng
    Li, Wencui
    Jiang, Peng
    Wong, Limsoon
    Li, Jinyan
    [J]. BIOINFORMATICS, 2017, 33 (23) : 3844 - 3851
  • [2] Effects of error-correction of heterozygous next-generation sequencing data
    Fujimoto, M. Stanley
    Bodily, Paul M.
    Okuda, Nozomu
    Clement, Mark J.
    Snell, Quinn
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [3] PAGANtec: OpenMP Parallel Error Correction for Next-Generation Sequencing Data
    Joppich, Markus
    Schmidl, Dirk
    Bolger, Anthony M.
    Kuhlen, Torsten
    Usadel, Bjoern
    [J]. OPENMP: HETEROGENOUS EXECUTION AND DATA MOVEMENTS, IWOMP 2015, 2015, 9342 : 3 - 17
  • [4] Effects of error-correction of heterozygous next-generation sequencing data
    M Stanley Fujimoto
    Paul M Bodily
    Nozomu Okuda
    Mark J Clement
    Quinn Snell
    [J]. BMC Bioinformatics, 15
  • [5] Benchmarking of computational error-correction methods for next-generation sequencing data
    Mitchell, Keith
    Brito, Jaqueline J.
    Mandric, Igor
    Wu, Qiaozhen
    Knyazev, Sergey
    Chang, Sei
    Martin, Lana S.
    Karlsberg, Aaron
    Gerasimov, Ekaterina
    Littman, Russell
    Hill, Brian L.
    Wu, Nicholas C.
    Yang, Harry
    Hsieh, Kevin
    Chen, Linus
    Littman, Eli
    Shabani, Taylor
    Enik, German
    Yao, Douglas
    Sun, Ren
    Schroeder, Jan
    Eskin, Eleazar
    Zelikovsky, Alex
    Skums, Pavel
    Pop, Mihai
    Mangul, Serghei
    [J]. ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,
  • [6] Benchmarking of computational error-correction methods for next-generation sequencing data
    Keith Mitchell
    Jaqueline J. Brito
    Igor Mandric
    Qiaozhen Wu
    Sergey Knyazev
    Sei Chang
    Lana S. Martin
    Aaron Karlsberg
    Ekaterina Gerasimov
    Russell Littman
    Brian L. Hill
    Nicholas C. Wu
    Harry Taegyun Yang
    Kevin Hsieh
    Linus Chen
    Eli Littman
    Taylor Shabani
    German Enik
    Douglas Yao
    Ren Sun
    Jan Schroeder
    Eleazar Eskin
    Alex Zelikovsky
    Pavel Skums
    Mihai Pop
    Serghei Mangul
    [J]. Genome Biology, 21
  • [7] Benchmarking of computational error-correction methods for next-generation sequencing data
    Mitchell, Keith
    Brito, Jaqueline J.
    Mandric, Igor
    Wu, Qiaozhen
    Knyazev, Sergey
    Chang, Sei
    Martin, Lana S.
    Karlsberg, Aaron
    Gerasimov, Ekaterina
    Littman, Russell
    Hill, Brian L.
    Wu, Nicholas C.
    Yang, Harry Taegyun
    Hsieh, Kevin
    Chen, Linus
    Littman, Eli
    Shabani, Taylor
    Enik, German
    Yao, Douglas
    Sun, Ren
    Schroeder, Jan
    Eskin, Eleazar
    Zelikovsky, Alex
    Skums, Pavel
    Pop, Mihai
    Mangul, Serghei
    [J]. GENOME BIOLOGY, 2020, 21 (01)
  • [8] Epidemiological data analysis of viral quasispecies in the next-generation sequencing era
    Knyazev, Sergey
    Hughes, Lauren
    Skums, Pavel
    Zelikovsky, Alexander
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (01) : 96 - 108
  • [9] Estimating Fitness of Viral Quasispecies from Next-Generation Sequencing Data
    Seifert, David
    Beerenwinkel, Niko
    [J]. QUASISPECIES: FROM THEORY TO EXPERIMENTAL SYSTEMS, 2016, 392 : 181 - 200
  • [10] Efficient error correction for next-generation sequencing of viral amplicons
    Skums, Pavel
    Dimitrova, Zoya
    Campo, David S.
    Vaughan, Gilberto
    Rossi, Livia
    Forbi, Joseph C.
    Yokosawa, Jonny
    Zelikovsky, Alex
    Khudyakov, Yury
    [J]. BMC BIOINFORMATICS, 2012, 13