Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data

被引:9
|
作者
Roder, A. E. [1 ]
Johnson, K. E. E. [1 ,2 ]
Knoll, M. [2 ]
Khalfan, M. [2 ]
Wang, B. [2 ]
Schultz-Cherry, S. [3 ]
Banakis, S. [1 ]
Kreitman, A. [1 ]
Mederos, C. [1 ]
Youn, J. -H. [4 ]
Mercado, R. [4 ]
Wang, W. [1 ]
Chung, M. [1 ]
Ruchnewitz, D. [5 ]
Samanovic, M. I. [6 ]
Mulligan, M. J. [6 ]
Laessig, M. [5 ]
Luksza, M. [7 ]
Das, S. [4 ]
Gresham, D. [2 ]
Ghedin, E. [1 ,2 ]
机构
[1] NIAID, Syst Genom Sect, Lab Parasit Dis, DIR,NIH, Bethesda, MD 20892 USA
[2] NYU, Ctr Genom & Syst Biol, Dept Biol, New York, NY 10012 USA
[3] St Jude Childrens Res Hosp, Dept Infect Dis, Memphis, TN USA
[4] NIH, Dept Lab Med, Bethesda, MD USA
[5] Univ Cologne, Inst Biol Phys, Cologne, Germany
[6] NYU, Langone Vaccine Ctr, Dept Med, New York, NY USA
[7] Icahn Sch Med Mt Sinai, Dept Oncol Sci, New York, NY USA
来源
MBIO | 2023年 / 14卷 / 04期
关键词
SARS-CoV-2; influenza; genomics; bioinformatics; RNA; SELECTION; EVOLUTION; MUTATION; CANCER;
D O I
10.1128/mbio.01046-23
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
High error rates of viral RNA-dependent RNA polymerases lead to diverse intra-host viral populations during infection. Errors made during replication that are not strongly deleterious to the virus can lead to the generation of minority variants. However, accurate detection of minority variants in viral sequence data is complicated by errors introduced during sample preparation and data analysis. We used synthetic RNA controls and simulated data to test seven variant-calling tools across a range of allele frequencies and simulated coverages. We show that choice of variant caller and use of replicate sequencing have the most significant impact on single-nucleotide variant (SNV) discovery and demonstrate how both allele frequency and coverage thresholds impact both false discovery and false-negative rates. When replicates are not available, using a combination of multiple callers with more stringent cutoffs is recommended. We use these parameters to find minority variants in sequencing data from SARS-CoV-2 clinical specimens and provide guidance for studies of intra-host viral diversity using either single replicate data or data from technical replicates. Our study provides a framework for rigorous assessment of technical factors that impact SNV identification in viral samples and establishes heuristics that will inform and improve future studies of intra-host variation, viral diversity, and viral evolution. IMPORTANCEWhen viruses replicate inside a host cell, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus nor strongly beneficial can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in the inclusion of false-positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant-calling tools. We used simulated and synthetic data to test their performance against a true set of variants and then used these studies to inform variant identification in data from SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution. When viruses replicate inside a host cell, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus nor strongly beneficial can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in the inclusion of false-positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant-calling tools. We used simulated and synthetic data to test their performance against a true set of variants and then used these studies to inform variant identification in data from SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Intra-host variation and evolutionary dynamics of SARS-CoV-2 populations in COVID-19 patients
    Wang, Yanqun
    Wang, Daxi
    Zhang, Lu
    Sun, Wanying
    Zhang, Zhaoyong
    Chen, Weijun
    Zhu, Airu
    Huang, Yongbo
    Xiao, Fei
    Yao, Jinxiu
    Gan, Mian
    Li, Fang
    Luo, Ling
    Huang, Xiaofang
    Zhang, Yanjun
    Sook-san Wong
    Cheng, Xinyi
    Ji, Jingkai
    Ou, Zhihua
    Xiao, Minfeng
    Li, Min
    Li, Jiandong
    Ren, Peidi
    Deng, Ziqing
    Zhong, Huanzi
    Xu, Xun
    Song, Tie
    Mok, Chris Ka Pun
    Peiris, Malik
    Zhong, Nanshan
    Zhao, Jingxian
    Li, Yimin
    Li, Junhua
    Zhao, Jincun
    GENOME MEDICINE, 2021, 13 (01)
  • [32] Intra-host mutation rate of acute SARS-CoV-2 infection during the initial pandemic wave
    Kim El-Haddad
    Thamali M. Adhikari
    Zheng Jin Tu
    Yu-Wei Cheng
    Xiaoyi Leng
    Xiangyi Zhang
    Daniel Rhoads
    Jennifer S. Ko
    Sarah Worley
    Jing Li
    Brian P. Rubin
    Frank P. Esper
    Virus Genes, 2023, 59 : 653 - 661
  • [33] Influenza Virus and SARS-CoV-2 Vaccines
    Sandor, Adam M.
    Sturdivant, Michael S.
    Ting, Jenny P. Y.
    JOURNAL OF IMMUNOLOGY, 2021, 206 (11): : 2509 - 2520
  • [34] SARS-CoV-2 and Influenza Virus Coinfection
    Kaptan, Figen
    FLORA INFEKSIYON HASTALIKLARI VE KLINIK MIKROBIYOLOJI DERGISI, 2020, 25 (04): : 457 - 463
  • [35] Coinfection with SARS-CoV-2 and influenza A virus
    Kondo, Yuki
    Miyazaki, Shinichi
    Yamashita, Ryo
    Ikeda, Takuya
    BMJ CASE REPORTS, 2020, 13 (07)
  • [36] Influenza virus and SARS-CoV-2: pathogenesis and host responses in the respiratory tract
    Tim Flerlage
    David F. Boyd
    Victoria Meliopoulos
    Paul G. Thomas
    Stacey Schultz-Cherry
    Nature Reviews Microbiology, 2021, 19 : 425 - 441
  • [37] Influenza virus and SARS-CoV-2: pathogenesis and host responses in the respiratory tract
    Flerlage, Tim
    Boyd, David F.
    Meliopoulos, Victoria
    Thomas, Paul G.
    Schultz-Cherry, Stacey
    NATURE REVIEWS MICROBIOLOGY, 2021, 19 (07) : 425 - 441
  • [38] Intra-host non-synonymous diversity at a neutralizing antibody epitope of SARS-CoV-2 spike protein N-terminal domain
    Ip, Jonathan Daniel
    Kok, Kin-Hang
    Chan, Wan-Mui
    Chu, Allen Wing-Ho
    Wu, Wai-Lan
    Yip, Cyril Chik-Yan
    To, Wing-Kin
    Tsang, Owen Tak-Yin
    Leung, Wai-Shing
    Chik, Thomas Shiu-Hong
    Chan, Kwok-Hung
    Hung, Ivan Fan-Ngai
    Yuen, Kwok-Yung
    To, Kelvin Kai-Wang
    CLINICAL MICROBIOLOGY AND INFECTION, 2021, 27 (09) : 1350.e1 - 1350.e5
  • [39] Investigating intra-host and intra-herd sequence diversity of foot-and-mouth disease virus
    King, David J.
    Freimanis, Graham L.
    Orton, Richard J.
    Waters, Ryan A.
    Haydon, Daniel T.
    King, Donald P.
    INFECTION GENETICS AND EVOLUTION, 2016, 44 : 286 - 292
  • [40] SARS-COV-2 VIRAL SEQUENCE DIVERSITY IN TWO LIMITED NORTH TEXAS COHORT
    Smith, A.
    Guo, Z.
    Gunby, T.
    Haseltine, F.
    Nilsson, R. I.
    HUMAN IMMUNOLOGY, 2022, 83 : 119 - 119