Consensus assessment of the contamination level of publicly available cyanobacterial genomes

被引:34
|
作者
Cornet, Luc [1 ,2 ]
Meunier, Loic [1 ]
Van Vlierberghe, Mick [1 ]
Leonard, Raphael R. [1 ,3 ]
Durieu, Benoit [4 ]
Lara, Yannick [4 ]
Misztak, Agnieszka [1 ,5 ]
Sirjacobs, Damien [1 ]
Javaux, Emmanuelle J. [2 ]
Philippe, Herve [6 ]
Wilmotte, Annick [4 ]
Baurain, Denis [1 ]
机构
[1] Univ Liege, InBioS PhytoSYST, Eukaryot Phylogen, Liege, Belgium
[2] Univ Liege, UR Geol Palaeobiogeol Palaeobot Palaeopalynol, Liege, Belgium
[3] Univ Liege, InBioS CIP, Macromol Crystallog, Liege, Belgium
[4] Univ Liege, Ctr Prot Engn, InBioS CIP, Liege, Belgium
[5] Intercollegiate Fac Biotechnol UG MUG, Gdansk, Poland
[6] Ctr Biodivers Theory & Modelling, Moulis, France
来源
PLOS ONE | 2018年 / 13卷 / 07期
基金
欧洲研究理事会;
关键词
HORIZONTAL GENE-TRANSFER; MULTIPLE SEQUENCE ALIGNMENT; BACTERIAL DIVERSITY; EVOLUTION; ORIGIN; TOOL; QUANTIFICATION; ANNOTATION; CONSISTENT; QUALITY;
D O I
10.1371/journal.pone.0200323
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Publicly available genomes are crucial for phylogenetic and metagenomic studies, in which contaminating sequences can be the cause of major problems. This issue is expected to be especially important for Cyanobacteria because axenic strains are notoriously difficult to obtain and keep in culture. Yet, despite their great scientific interest, no data are currently available concerning the quality of publicly available cyanobacterial genomes. As reliably detecting contaminants is a complex task, we designed a pipeline combining six methods in a consensus strategy to assess the contamination level of 440 genome assemblies of Cyanobacteria. Two methods are based on published reference databases of ribosomal genes (SSU rRNA 16S and ribosomal proteins), one is indirectly based on a reference database of marker genes (CheckM), and three are based on complete genome analysis. Among those genome-wide methods, Kraken and DIAMOND blastx share the same reference database that we derived from Ensembl Bacteria, whereas CONCOCT does not require any reference database, instead relying on differences in DNA tetramer frequencies. Given that all the six methods appear to have their own strengths and limitations, we used the consensus of their rankings to infer that >5% of cyanobacterial genome assemblies are highly contaminated by foreign DNA (i.e., contaminants were detected by 5 or 6 methods). Our results will help researchers to check the quality of publicly available genomic data before use in their own analyses. Moreover, we argue that journals should make mandatory the submission of raw read data along with genome assemblies in order to facilitate the detection of contaminants in sequence databases.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Signatures of Mollicutes-related endobacteria in publicly available Mucoromycota genomes
    Longley, Reid
    Robinson, Aaron J.
    Asher, Olivia A.
    Middlebrook, Earl
    Bonito, Gregory
    Chain, Patrick S. G.
    [J]. MSPHERE, 2024, 9 (09)
  • [2] Antimicrobial Resistance Genes Analysis of Publicly Available Staphylococcus aureus Genomes
    Pennone, Vincenzo
    Prieto, Miguel
    Alvarez-Ordonez, Avelino
    Cobo-Diaz, Jose F.
    [J]. ANTIBIOTICS-BASEL, 2022, 11 (11):
  • [3] Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes
    Robinson, Aaron J.
    Daligault, Hajnalka E.
    Kelliher, Julia M.
    LeBrun, Erick S.
    Chain, Patrick S. G.
    [J]. FRONTIERS IN GENETICS, 2022, 12
  • [4] Assessment of rooftop photovoltaic potentials at the urban level using publicly available geodata and image recognition techniques
    Mainzer, Kai
    Killinger, Sven
    McKenn, Russell
    Fichtner, Wolf
    [J]. SOLAR ENERGY, 2017, 155 : 561 - 573
  • [5] An Overview of Antimicrobial Resistance Profiles of Publicly Available Salmonella Genomes with Sufficient Quality and Metadata
    Nuanmuang, Narong
    Leekitcharoenphon, Pimlapas
    Njage, Patrick Murigu Kamau
    Gmeiner, Alexander
    Aarestrup, Frank M. M.
    [J]. FOODBORNE PATHOGENS AND DISEASE, 2023, 20 (09) : 405 - 413
  • [6] Assessment of microplastic contamination in commercially available fishes
    Mohan, Amrutha Vellore
    Kuttykattil, Aswin
    Toshiaki, Itami
    Sudhakaran, Raja
    [J]. MARINE ENVIRONMENTAL RESEARCH, 2024, 196
  • [7] Assessment of the patulin contamination level in selected apple-based products available in retail in Poland
    Pokrzywa, Piotr
    Surma, Magdalena
    [J]. AGRICULTURAL AND FOOD SCIENCE, 2022, 31 (01) : 37 - 43
  • [8] Critical assessment of chromatographic metadata in publicly available metabolomics data repositories
    Eva-Maria Harrieder
    Fleming Kretschmer
    Warwick Dunn
    Sebastian Böcker
    Michael Witting
    [J]. Metabolomics, 18
  • [9] Electromagnetic Fields Exposure Assessment in Europe Utilizing Publicly Available Data
    Iakovidis, Serafeim
    Apostolidis, Christos
    Manassas, Athanasios
    Samaras, Theodoros
    [J]. SENSORS, 2022, 22 (21)
  • [10] Critical assessment of chromatographic metadata in publicly available metabolomics data repositories
    Harrieder, Eva-Maria
    Kretschmer, Fleming
    Dunn, Warwick
    Boecker, Sebastian
    Witting, Michael
    [J]. METABOLOMICS, 2022, 18 (12)