A software tool 'CroCo' detects pervasive cross-species contamination in next generation sequencing data

被引:64
|
作者
Simion, Paul [1 ,3 ]
Belkhir, Khalid [1 ]
Francois, Clementine [1 ]
Veyssier, Julien [1 ]
Rink, Jochen C. [2 ]
Manuel, Michael [3 ]
Philippe, Herve [4 ,5 ]
Telford, Maximilian J. [6 ]
机构
[1] Univ Montpellier, Inst Sci Evolut ISEM, CNRS, UMR 5554,IRD,EPHE, Montpellier, France
[2] Max Plank Inst Mol Cell Biol & Genet, Pfotenhauerstr 108, D-01307 Dresden, Germany
[3] Sorbonne Univ, CNRS, IBPS, Evolut Paris Seine UMR7138, Case 05,7 Quai St Bernard, F-75005 Paris, France
[4] CNRS, UMR 5321, Ctr Theorisat & Modelisat Biodiversite, Stn Ecol Theor & Expt, F-09200 Moulis, France
[5] Univ Montreal, Ctr Robert Cedergren, Dept Biochim, Montreal, PQ H3C 3J7, Canada
[6] UCL, Dept Genet Evolut & Environm, Ctr Lifes Origins & Evolut, Darwin Bldg,Gower St, London WC1E 6BT, England
基金
欧洲研究理事会; 英国生物技术与生命科学研究理事会;
关键词
Contamination; NGS; Phylogenomics; Ctenophora; RNA-SEQ DATA; SISTER GROUP; PHYLOGENY; PLACEMENT; ORIGIN; GENOME; ERROR;
D O I
10.1186/s12915-018-0486-7
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Multiple RNA samples are frequently processed together and often mixed before multiplex sequencing in the same sequencing run. While different samples can be separated post sequencing using sample barcodes, the possibility of cross contamination between biological samples from different species that have been processed or sequenced in parallel has the potential to be extremely deleterious for downstream analyses. Results: We present CroCo, a software package for identifying and removing such cross contaminants from assembled transcriptomes. Using multiple, recently published sequence datasets, we show that cross contamination is consistently present at varying levels in real data. Using real and simulated data, we demonstrate that CroCo detects contaminants efficiently and correctly. Using a real example from a molecular phylogenetic dataset, we show that contaminants, if not eliminated, can have a decisive, deleterious impact on downstream comparative analyses. Conclusions: Cross contamination is pervasive in new and published datasets and, if undetected, can have serious deleterious effects on downstream analyses. CroCo is a database-independent, multi-platform tool, designed for ease of use, that efficiently and accurately detects and removes cross contamination in assembled transcriptomes to avoid these problems. We suggest that the use of CroCo should become a standard cleaning step when processing multiple samples for transcriptome sequencing.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data
    Paul Simion
    Khalid Belkhir
    Clémentine François
    Julien Veyssier
    Jochen C. Rink
    Michaël Manuel
    Hervé Philippe
    Maximilian J. Telford
    [J]. BMC Biology, 16
  • [2] Unexpected cross-species contamination in genome sequencing projects
    Merchant, Samier
    Wood, Derrick E.
    Salzberg, Steven L.
    [J]. PEERJ, 2014, 2
  • [3] Cross-Species, Amplifiable EST-SSR Markers for Amentotaxus Species Obtained by Next-Generation Sequencing
    Li, Chiuan-Yu
    Chiang, Tzen-Yuh
    Chiang, Yu-Chung
    Hsu, Hsin-Mei
    Ge, Xue-Jun
    Huang, Chi-Chun
    Chen, Chaur-Tzuhn
    Hung, Kuo-Hsiang
    [J]. MOLECULES, 2016, 21 (01)
  • [4] Alview (ALignment VIEWer): A software tool to visualize next generation sequencing (NGS) data
    Meerzaman, Daoud
    Finney, Richard
    Chen, Qing-Rong
    Cu Nguyen
    Hsu, Chih Hao
    Dunn, Barbra
    [J]. CANCER RESEARCH, 2015, 75
  • [5] YMGV: a cross-species expression data mining tool
    Lelandais, G
    Le Crom, S
    Devaux, F
    Vialette, S
    Church, GM
    Jacq, C
    Marc, P
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D323 - D325
  • [6] ContEst: estimating cross-contamination of human samples in next-generation sequencing data
    Cibulskis, Kristian
    McKenna, Aaron
    Fennell, Tim
    Banks, Eric
    DePristo, Mark
    Getz, Gad
    [J]. BIOINFORMATICS, 2011, 27 (18) : 2601 - 2602
  • [7] NgsRelate: a software tool for estimating pairwise relatedness from next-generation sequencing data
    Korneliussen, Thorfinn Sand
    Moltke, Ida
    [J]. BIOINFORMATICS, 2015, 31 (24) : 4009 - 4011
  • [8] Development of microsatellite markers inBetula costata(Betulaceae) by next-generation sequencing and cross-species transferability test
    Lee, Min-Woo
    Lee, Jei-Wan
    Kim, Sang-Chul
    Ahn, Ji-Young
    [J]. MOLECULAR BIOLOGY REPORTS, 2020, 47 (08) : 6407 - 6415
  • [9] Development of novel, cross-species microsatellite markers for Acropora corals using next-generation sequencing technology
    Shinzato, Chuya
    Yasuoka, Yuki
    Mungpakdee, Sutada
    Arakaki, Nana
    Fujie, Manabu
    Nakajima, Yuichi
    Satoh, Nori
    [J]. FRONTIERS IN MARINE SCIENCE, 2014, 1
  • [10] Development of microsatellite markers in Betula costata (Betulaceae) by next-generation sequencing and cross-species transferability test
    Min-Woo Lee
    Jei-Wan Lee
    Sang-Chul Kim
    Ji-Young Ahn
    [J]. Molecular Biology Reports, 2020, 47 : 6407 - 6415