Advantages of long- and short-reads sequencing for the hybrid investigation of the Mycobacterium tuberculosis genome

被引:8
|
作者
Di Marco, Federico [1 ,2 ]
Spitaleri, Andrea [1 ,3 ]
Battaglia, Simone [1 ]
Batignani, Virginia [1 ]
Cabibbe, Andrea Maurizio [1 ]
Cirillo, Daniela Maria [1 ]
机构
[1] IRCCS San Raffaele Sci Inst, Emerging Bacterial Pathogens Unit, Milan, Italy
[2] Fdn Ctr San Raffaele, Milan, Italy
[3] Univ Vita Salute San Raffaele, Milan, Italy
关键词
next-generation sequencing; hybrid approach; long reads; drug resistance; Mycobacterium tuberculosis; transmission analysis; repetitive regions; RESISTANCE;
D O I
10.3389/fmicb.2023.1104456
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
IntroductionIn the fight to limit the global spread of antibiotic resistance, computational challenges associated with sequencing technology can impact the accuracy of downstream analysis, including drug resistance identification, transmission, and genome resolution. About 10% of Mycobacterium tuberculosis (MTB) genome is constituted by the PE/PPE family, a GC-rich repetitive genome region. Although sequencing using short read technology is widely used, it is well recognized its limit in the PE/PPE regions due to the unambiguously mapping process onto the reference genome. The aim of this study was to compare the performances of short-reads (SRS), long-reads (LRS) and hybrid-reads (HYBR) based analysis over different common investigative tasks: genome coverage estimation, variant calling and cluster analysis, drug resistance detection and de novo assembly. MethodsFor the study 13 model MTB clinical isolates were sequenced with both SRS and LRS. HYBR were produced correcting the long reads with the short reads. The fastq from the three approaches were then processed using a customized version of MTBseq for genome coverage estimation and variant calling and using two different assemblers for de novo assembly evaluation. ResultsEstimation of genome coverage performances showed lower 8X breadth coverage for SRS respect to LRS and HYBR: considering the PE/PPE genes, SRS showed low results for the PE_PGRS family, while obtained acceptable coverage in PE and PPE genes; LRS and HYBR reached optimal coverages in PE/PPE genes. For variant calling HYBR showed the highest resolution, detecting the highest percentage of uniquely identified mutations compared to LRS and SRS. All three approaches agreed on the identification of two major clusters, with HYBR identifying an higher number of SNPs between the two clusters. Comparing the quality of the assemblies, HYBR and LRS obtained better results than SRS. DiscussionIn conclusion, depending on the aim of the investigation, both SRS and LRS present complementary advantages and limitations implying that for a full resolution of MTB genomes, where all the mentioned analyses and both technologies are needed, the use of the HYBR approach represents a valid option and a well-rounded strategy.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] HYBRIDSPADES: an algorithm for hybrid assembly of short and long reads
    Antipov, Dmitry
    Korobeynikov, Anton
    McLean, Jeffrey S.
    Pevzner, Pavel A.
    BIOINFORMATICS, 2016, 32 (07) : 1009 - 1015
  • [32] A bioinformatics pipeline for Mycobacterium tuberculosis sequencing that cleans contaminant reads from sputum samples
    Cuevas-Cordoba, Betzaida
    Fresno, Cristobal
    Haase-Hernandez, Joshua, I
    Barbosa-Amezcua, Martin
    Mata-Rocha, Minerva
    Munoz-Torrico, Marcela
    Salazar-Lezama, Miguel A.
    Martinez-Orozco, Jose A.
    Narvaez-Diaz, Luis A.
    Salas-Hernandez, Jorge
    Gonzalez-Covarrubias, Vanessa
    Soberon, Xavier
    PLOS ONE, 2021, 16 (10):
  • [33] Long- and short-range correlations in genome organization
    Almirantis, Y
    Provata, A
    JOURNAL OF STATISTICAL PHYSICS, 1999, 97 (1-2) : 233 - 262
  • [34] Fast and accurate matching of cellular barcodes across short-reads and long-reads of single-cell RNA-seq experiments
    Ebrahimi, Ghazal
    Orabi, Baraa
    Robinson, Meghan
    Chauve, Cedric
    Flannigan, Ryan
    Hach, Faraz
    ISCIENCE, 2022, 25 (07)
  • [35] Mycobacterium tuberculosis - Heterogeneity revealed through whole genome sequencing
    Ford, Chris
    Yusim, Karina
    Ioerger, Tom
    Feng, Shihai
    Chase, Michael
    Greene, Mary
    Korber, Bette
    Fortune, Sarah
    TUBERCULOSIS, 2012, 92 (03) : 194 - 201
  • [36] Circlator: automated circularization of genome assemblies using long sequencing reads
    Martin Hunt
    Nishadi De Silva
    Thomas D. Otto
    Julian Parkhill
    Jacqueline A. Keane
    Simon R. Harris
    Genome Biology, 16
  • [37] Circlator: automated circularization of genome assemblies using long sequencing reads
    Hunt, Martin
    De Silva, Nishadi
    Otto, Thomas D.
    Parkhill, Julian
    Keane, Jacqueline A.
    Harris, Simon R.
    GENOME BIOLOGY, 2015, 16
  • [38] Whole-genome sequencing of Mycobacterium tuberculosis from Cambodia
    Konstantin Edokimov
    Yoshiyuki Yamada
    Chhavarath Dary
    Qing Hao Miow
    Li-Yang Hsu
    Rick Twee-Hee Ong
    Vonthanak Saphonn
    Scientific Reports, 12
  • [39] Whole-genome sequencing of Mycobacterium tuberculosis from Cambodia
    Edokimov, Konstantin
    Yamada, Yoshiyuki
    Dary, Chhavarath
    Miow, Qing Hao
    Hsu, Li-Yang
    Ong, Rick Twee-Hee
    Saphonn, Vonthanak
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [40] Clinical value of whole-genome sequencing of Mycobacterium tuberculosis
    Takiff, Howard E.
    Feo, Oscar
    LANCET INFECTIOUS DISEASES, 2015, 15 (09): : 1077 - 1090