A software tool 'CroCo' detects pervasive cross-species contamination in next generation sequencing data

被引:64
|
作者
Simion, Paul [1 ,3 ]
Belkhir, Khalid [1 ]
Francois, Clementine [1 ]
Veyssier, Julien [1 ]
Rink, Jochen C. [2 ]
Manuel, Michael [3 ]
Philippe, Herve [4 ,5 ]
Telford, Maximilian J. [6 ]
机构
[1] Univ Montpellier, Inst Sci Evolut ISEM, CNRS, UMR 5554,IRD,EPHE, Montpellier, France
[2] Max Plank Inst Mol Cell Biol & Genet, Pfotenhauerstr 108, D-01307 Dresden, Germany
[3] Sorbonne Univ, CNRS, IBPS, Evolut Paris Seine UMR7138, Case 05,7 Quai St Bernard, F-75005 Paris, France
[4] CNRS, UMR 5321, Ctr Theorisat & Modelisat Biodiversite, Stn Ecol Theor & Expt, F-09200 Moulis, France
[5] Univ Montreal, Ctr Robert Cedergren, Dept Biochim, Montreal, PQ H3C 3J7, Canada
[6] UCL, Dept Genet Evolut & Environm, Ctr Lifes Origins & Evolut, Darwin Bldg,Gower St, London WC1E 6BT, England
基金
英国生物技术与生命科学研究理事会; 欧洲研究理事会;
关键词
Contamination; NGS; Phylogenomics; Ctenophora; RNA-SEQ DATA; SISTER GROUP; PHYLOGENY; PLACEMENT; ORIGIN; GENOME; ERROR;
D O I
10.1186/s12915-018-0486-7
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Multiple RNA samples are frequently processed together and often mixed before multiplex sequencing in the same sequencing run. While different samples can be separated post sequencing using sample barcodes, the possibility of cross contamination between biological samples from different species that have been processed or sequenced in parallel has the potential to be extremely deleterious for downstream analyses. Results: We present CroCo, a software package for identifying and removing such cross contaminants from assembled transcriptomes. Using multiple, recently published sequence datasets, we show that cross contamination is consistently present at varying levels in real data. Using real and simulated data, we demonstrate that CroCo detects contaminants efficiently and correctly. Using a real example from a molecular phylogenetic dataset, we show that contaminants, if not eliminated, can have a decisive, deleterious impact on downstream comparative analyses. Conclusions: Cross contamination is pervasive in new and published datasets and, if undetected, can have serious deleterious effects on downstream analyses. CroCo is a database-independent, multi-platform tool, designed for ease of use, that efficiently and accurately detects and removes cross contamination in assembled transcriptomes to avoid these problems. We suggest that the use of CroCo should become a standard cleaning step when processing multiple samples for transcriptome sequencing.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Rapid Development of Microsatellite Markers for the Endangered Fish Schizothorax biddulphi (Gunther) Using Next Generation Sequencing and Cross-Species Amplification
    Luo, Wei
    Nie, Zhulan
    Zhan, Fanbin
    Wei, Jie
    Wang, Weimin
    Gao, Zexia
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2012, 13 (11): : 14946 - 14955
  • [22] Development of 27 novel cross-species microsatellite markers for the endangered Hucho bleekeri using next-generation sequencing technology
    Wang, Ke
    Zhang, Shuhuan
    Wang, Dengqiang
    Xin, Miaomiao
    Wu, Jinming
    Sun, Qingliang
    Du, Hao
    Wang, Chengyou
    Huang, Jun
    Wei, Qiwei
    [J]. CONSERVATION GENETICS RESOURCES, 2015, 7 (01) : 263 - 267
  • [23] Efficient cross-species capture hybridization and next-generation sequencing of mitochondrial genomes from noninvasively sampled museum specimens
    Mason, Victor C.
    Li, Gang
    Helgen, Kristofer M.
    Murphy, William J.
    [J]. GENOME RESEARCH, 2011, 21 (10) : 1695 - 1704
  • [24] Development of 27 novel cross-species microsatellite markers for the endangered Hucho bleekeri using next-generation sequencing technology
    Ke Wang
    Shuhuan Zhang
    Dengqiang Wang
    Miaomiao Xin
    Jinming Wu
    Qingliang Sun
    Hao Du
    Chengyou Wang
    Jun Huang
    Qiwei Wei
    [J]. Conservation Genetics Resources, 2015, 7 : 263 - 267
  • [25] Games: a new tool for genomic annotation of next generation sequencing datA
    Sana, M. E.
    Iascone, M.
    Marchetti, D.
    Galasso, M.
    Volinia, S.
    [J]. EUROPEAN JOURNAL OF MEDICAL RESEARCH, 2010, 15 : 71 - 71
  • [26] HLAreporter: a tool for HLA typing from next generation sequencing data
    Huang, Yazhi
    Yang, Jing
    Ying, Dingge
    Zhang, Yan
    Shotelersuk, Vorasuk
    Hirankarn, Nattiya
    Sham, Pak Chung
    Lau, Yu Lung
    Yang, Wanling
    [J]. GENOME MEDICINE, 2015, 7
  • [27] HLAreporter: a tool for HLA typing from next generation sequencing data
    Yazhi Huang
    Jing Yang
    Dingge Ying
    Yan Zhang
    Vorasuk Shotelersuk
    Nattiya Hirankarn
    Pak Chung Sham
    Yu Lung Lau
    Wanling Yang
    [J]. Genome Medicine, 7
  • [28] Microsatellite marker development based on next-generation sequencing for the smooth marron (Cherax cainii, Austin) and cross-species amplification in other Cherax species
    Loughnan S.R.
    Beheregaray L.B.
    Robinson N.A.
    [J]. BMC Research Notes, 8 (1)
  • [29] Towards comparative epigenomics: A software toolkit for cross-species epigenome data analysis
    Feuerbach, Lars
    Bock, Christoph
    Halachev, Konstantin
    Buch, Joachim
    Lengauer, Thomas
    [J]. CELLULAR ONCOLOGY, 2008, 30 (03) : 230 - 231
  • [30] NGSremix: a software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data
    Nohr, Anne Krogh
    Hanghoj, Kristian
    Garcia-Erill, Genis
    Li, Zilong
    Moltke, Ida
    Albrechtsen, Anders
    [J]. G3-GENES GENOMES GENETICS, 2021, 11 (08): : 1 - 8