Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies

被引:150
|
作者
Zagordi, Osvaldo [1 ]
Klein, Rolf [2 ]
Daeumer, Martin [2 ]
Beerenwinkel, Niko [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, CH-4058 Basel, Switzerland
[2] Inst Immunol & Genet, D-67655 Kaiserslautern, Germany
基金
瑞士国家科学基金会;
关键词
DRUG-RESISTANCE MUTATIONS; MOLECULAR-BIOLOGY; NAIVE PATIENTS; POPULATIONS; GENO2PHENO; DIVERSITY; GENOME;
D O I
10.1093/nar/gkq655
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencing data obtained from a 1.5-kb-fragment of the HIV-1 gag/pol gene in two control and two clinical samples. The effect of PCR amplification was analysed. Error correction resulted in a two- and five-fold decrease of the pyrosequencing base substitution rate, from 0.05% to 0.03% and from 0.25% to 0.05% in the non-PCR and PCR-amplified samples, respectively. We were able to detect viral clones as rare as 0.1% with perfect sequence reconstruction. Probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall. Genetic diversity observed within and between two clinical samples resulted in various patterns of phenotypic drug resistance and suggests a close epidemiological link. We conclude that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treated.
引用
收藏
页码:7400 / 7409
页数:10
相关论文
共 50 条
  • [21] K-Mer Spectrum-Based Error Correction Algorithm for Next-Generation Sequencing Data
    AlEisa, Hussah N.
    Hamad, Safwat
    Elhadad, Ahmed
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [22] K-Mer Spectrum-Based Error Correction Algorithm for Next-Generation Sequencing Data
    AlEisa, Hussah N.
    Hamad, Safwat
    Elhadad, Ahmed
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [23] Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
    Mattia CF Prosperi
    Luciano Prosperi
    Alessandro Bruselles
    Isabella Abbate
    Gabriella Rozera
    Donatella Vincenti
    Maria Carmela Solmone
    Maria Rosaria Capobianchi
    Giovanni Ulivi
    [J]. BMC Bioinformatics, 12
  • [24] Applying next-generation sequencing to unravel the mutational landscape in viral quasispecies
    Lu, I-Na
    Muller, Claude P.
    He, Feng Q.
    [J]. VIRUS RESEARCH, 2020, 283
  • [25] Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing
    Prosperi, Mattia C. F.
    Prosperi, Luciano
    Bruselles, Alessandro
    Abbate, Isabella
    Rozera, Gabriella
    Vincenti, Donatella
    Solmone, Maria Carmela
    Capobianchi, Maria Rosaria
    Ulivi, Giovanni
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [26] Indexing Next-Generation Sequencing data
    Jalili, Vahid
    Matteucci, Matteo
    Masseroli, Marco
    Ceri, Stefano
    [J]. INFORMATION SCIENCES, 2017, 384 : 90 - 109
  • [27] Can Directionality of HIV Transmission be Predicted by Next-Generation Sequencing Data?
    Gunthard, Huldrych F.
    Kouyos, Roger
    [J]. JOURNAL OF INFECTIOUS DISEASES, 2019, 220 (09): : 1393 - 1395
  • [28] Next-generation sequencing to assess HIV tropism
    Swenson, Luke C.
    Daeumer, Martin
    Paredes, Roger
    [J]. CURRENT OPINION IN HIV AND AIDS, 2012, 7 (05) : 478 - 485
  • [29] Improving the estimation of genetic distances from Next-Generation Sequencing data
    Vieira, Filipe G.
    Lassalle, Florent
    Korneliussen, Thorfinn S.
    Fumagalli, Matteo
    [J]. BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY, 2016, 117 (01) : 139 - 149
  • [30] Adaptive bandwidth kernel density estimation for next-generation sequencing data
    Parameswaran Ramachandran
    Theodore J Perkins
    [J]. BMC Proceedings, 7 (Suppl 7)