BMC3C: binning metagenomic contigs using codon usage, sequence composition and read coverage

被引:34
|
作者
Yu, Guoxian [1 ]
Jiang, Yuan [1 ]
Wang, Jun [1 ]
Zhang, Hao [2 ,3 ]
Luo, Haiwei [2 ,3 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
[2] Chinese Univ Hong Kong, Sch Life Sci, Shatin, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong, Partner State Key Lab Agrobiotechnol, Shatin, Hong Kong, Peoples R China
关键词
PHYLOGENETIC CLASSIFICATION; GENOMES; ALGORITHM;
D O I
10.1093/bioinformatics/bty519
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Metagenomics investigates the DNA sequences directly recovered from environmental samples. It often starts with reads assembly, which leads to contigs rather than more complete genomes. Therefore, contig binning methods are subsequently used to bin contigs into genome bins. While some clustering-based binning methods have been developed, they generally suffer from problems related to stability and robustness. Results: We introduce BMC3C, an ensemble clustering-based method, to accurately and robustly bin contigs by making use of DNA sequence Composition, Coverage across multiple samples and Codon usage. BMC3C begins by searching the proper number of clusters and repeatedly applying the k-means clustering with different initializations to cluster contigs. Next, a weight graph with each node representing a contig is derived from these clusters. If two contigs are frequently grouped into the same cluster, the weight between them is high, and otherwise low. BMC3C finally employs a graph partitioning technique to partition the weight graph into subgraphs, each corresponding to a genome bin. We conduct experiments on both simulated and real-world datasets to evaluate BMC3C, and compare it with the state-of-the-art binning tools. We show that BMC3C has an improved performance compared to these tools. To our knowledge, this is the first time that the codon usage features and ensemble clustering are used in metagenomic contig binning.
引用
收藏
页码:4172 / 4179
页数:8
相关论文
共 11 条
  • [1] Binning metagenomic contigs by coverage and composition
    Alneberg, Johannes
    Bjarnason, Brynjar Smari
    de Bruijn, Ino
    Schirmer, Melanie
    Quick, Joshua
    Ijaz, Umer Z.
    Lahti, Leo
    Loman, Nicholas J.
    Andersson, Anders F.
    Quince, Christopher
    NATURE METHODS, 2014, 11 (11) : 1144 - 1146
  • [2] Binning metagenomic contigs by coverage and composition
    Johannes Alneberg
    Brynjar Smári Bjarnason
    Ino de Bruijn
    Melanie Schirmer
    Joshua Quick
    Umer Z Ijaz
    Leo Lahti
    Nicholas J Loman
    Anders F Andersson
    Christopher Quince
    Nature Methods, 2014, 11 : 1144 - 1146
  • [3] COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge
    Lu, Yang Young
    Chen, Ting
    Fuhrman, Jed A.
    Sun, Fengzhu
    BIOINFORMATICS, 2017, 33 (06) : 791 - 798
  • [4] Accurate Binning of Metagenomic Contigs Using Composition, Coverage, and Assembly Graphs
    Mallawaarachchi, Vijini
    Lin, Yu
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2022, 29 (12) : 1357 - 1376
  • [5] Unsupervised Binning of Metagenomic Assembled Contigs Using Improved Fuzzy C-Means Method
    Liu, Yun
    Hou, Tao
    Kang, Bing
    Liu, Fu
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (06) : 1459 - 1467
  • [6] CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision
    Damayanthi Herath
    Sen-Lin Tang
    Kshitij Tandon
    David Ackland
    Saman Kumara Halgamuge
    BMC Bioinformatics, 18
  • [7] CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision
    Herath, Damayanthi
    Tang, Sen-Lin
    Tandon, Kshitij
    Ackland, David
    Halgamuge, Saman Kumara
    BMC BIOINFORMATICS, 2017, 18
  • [8] HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps
    Du, Yuxuan
    Sun, Fengzhu
    GENOME BIOLOGY, 2022, 23 (01)
  • [9] HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps
    Yuxuan Du
    Fengzhu Sun
    Genome Biology, 23
  • [10] Gene Coverage Count and Classification (GC3): a locus sequence coverage assessment tool using short-read whole genome sequencing data, and its application to identify and classify histidine-rich protein 2 and 3 deletions in Plasmodium falciparum
    Thomas C. Stabler
    Ankit Dwivedi
    Biraj Shrestha
    Sudhaunshu Joshi
    Tobias Schindler
    Amed Ouattara
    Guillermo A. García
    Claudia Daubenberger
    Joana C. Silva
    Malaria Journal, 21