A Statistical Framework for Accurate Taxonomic Assignment of Metagenomic Sequencing Reads

被引:21
|
作者
Jiang, Hongmei [1 ]
An, Lingling [2 ,3 ]
Lin, Simon M. [4 ,5 ]
Feng, Gang [6 ]
Qiu, Yuqing [1 ]
机构
[1] Northwestern Univ, Dept Stat, Evanston, IL 60208 USA
[2] Univ Arizona, Dept Agr & Biosyst Engn, Tucson, AZ USA
[3] Univ Arizona, Interdisciplinary Program Stat, Tucson, AZ USA
[4] Marshfield Clin Res Fdn, Biomed Informat Res Ctr, Marshfield, WI USA
[5] Univ Wisconsin, Inst Clin & Translat Res, Madison, WI USA
[6] Northwestern Univ, Biomed Informat Ctr, Chicago, IL 60611 USA
来源
PLOS ONE | 2012年 / 7卷 / 10期
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
GENERATION; BLAST;
D O I
10.1371/journal.pone.0046450
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial community. Multiple genomes contained in a metagenomic sample can be identified and quantitated through homology searches of sequence reads with known sequences catalogued in reference databases. Traditionally, reads with multiple genomic hits are assigned to non-specific or high ranks of the taxonomy tree, thereby impacting on accurate estimates of relative abundance of multiple genomes present in a sample. Instead of assigning reads one by one to the taxonomy tree as many existing methods do, we propose a statistical framework to model the identified candidate genomes to which sequence reads have hits. After obtaining the estimated proportion of reads generated by each genome, sequence reads are assigned to the candidate genomes and the taxonomy tree based on the estimated probability by taking into account both sequence alignment scores and estimated genome abundance. The proposed method is comprehensively tested on both simulated datasets and two real datasets. It assigns reads to the low taxonomic ranks very accurately. Our statistical approach of taxonomic assignment of metagenomic reads, TAMER, is implemented in R and available at http://faculty.wcas.northwestern.edu/similar to hji403/MetaR.htm.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] ACCURATE TAXONOMIC ASSIGNMENT OF SHORT PYROSEQUENCING READS
    Clemente, Jose C.
    Jansson, Jesper
    Valiente, Gabriel
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2010, 2010, : 3 - 9
  • [2] Flexible taxonomic assignment of ambiguous sequencing reads
    José C Clemente
    Jesper Jansson
    Gabriel Valiente
    BMC Bioinformatics, 12
  • [3] Flexible taxonomic assignment of ambiguous sequencing reads
    Clemente, Jose C.
    Jansson, Jesper
    Valiente, Gabriel
    BMC BIOINFORMATICS, 2011, 12
  • [4] Increase in taxonomic assignment efficiency of viral reads in metagenomic studies
    Francois, S.
    Filloux, D.
    Frayssinet, M.
    Roumagnac, P.
    Martin, D. P.
    Ogliastro, M.
    Froissart, R.
    VIRUS RESEARCH, 2018, 244 : 230 - 234
  • [5] A novel semi-supervised algorithm for the taxonomic assignment of metagenomic reads
    Vinh Van Le
    Lang Van Tran
    Hoai Van Tran
    BMC BIOINFORMATICS, 2016, 17
  • [6] A novel semi-supervised algorithm for the taxonomic assignment of metagenomic reads
    Vinh Van Le
    Lang Van Tran
    Hoai Van Tran
    BMC Bioinformatics, 17
  • [7] Orphelia: predicting genes in metagenomic sequencing reads
    Hoff, Katharina J.
    Lingner, Thomas
    Meinicke, Peter
    Tech, Maike
    NUCLEIC ACIDS RESEARCH, 2009, 37 : W101 - W105
  • [8] PIA: More Accurate Taxonomic Assignment of Metagenomic Data Demonstrated on sedaDNA From the North Sea
    Cribdon, Becky
    Ware, Roselyn
    Smith, Oliver
    Gaffney, Vincent
    Allaby, Robin G.
    FRONTIERS IN ECOLOGY AND EVOLUTION, 2020, 8
  • [9] Fast and sensitive taxonomic assignment to metagenomic contigs
    Mirdita, M.
    Steinegger, M.
    Breitwieser, F.
    Soeding, J.
    Karin, E. Levy
    BIOINFORMATICS, 2021, 37 (18) : 3029 - 3031
  • [10] MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks
    Gori, Fabio
    Folino, Gianluigi
    Jetten, Mike S. M.
    Marchiori, Elena
    BIOINFORMATICS, 2011, 27 (02) : 196 - 203