Phylogenomic clustering for selecting non-redundant genomes for comparative genomics

被引:32
|
作者
Moreno-Hagelsieb, Gabriel [1 ]
Wang, Zilin [2 ]
Walsh, Stephanie [2 ]
ElSherbiny, Aisha [1 ]
机构
[1] Wilfrid Laurier Univ, Dept Biol, Waterloo, ON N2L 3C5, Canada
[2] Wilfrid Laurier Univ, Dept Math, Waterloo, ON N2L 3C5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
GENE CLUSTERS; CONSERVATION; SIGNATURE;
D O I
10.1093/bioinformatics/btt064
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Analyses in comparative genomics often require non-redundant genome datasets. Eliminating redundancy is not as simple as keeping one strain for each named species because genomes might be redundant at a higher taxonomic level than that of species for some analyses; some strains with different species names can be as similar as most strains sharing a species name, whereas some strains sharing a species name can be so different that they should be put into different groups; and some genomes lack a species name. Results: We have implemented a method and Web server that clusters a genome dataset into groups of redundant genomes at different thresholds based on a few phylogenomic distance measures.
引用
收藏
页码:947 / 949
页数:3
相关论文
共 50 条
  • [1] Non-redundant data clustering
    Gondek, D
    Hofmann, T
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 75 - 82
  • [2] Non-redundant data clustering
    David Gondek
    Thomas Hofmann
    [J]. Knowledge and Information Systems, 2007, 12 : 1 - 24
  • [3] Non-redundant data clustering
    Gondek, David
    Hofmann, Thomas
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 12 (01) : 1 - 24
  • [4] Deep Embedded Non-Redundant Clustering
    Miklautz, Lukas
    Mautz, Dominik
    Altinigneli, Muzaffer Can
    Boehm, Christian
    Plant, Claudia
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5174 - 5181
  • [5] Non-redundant multiple clustering by nonnegative matrix factorization
    Yang, Sen
    Zhang, Lijun
    [J]. MACHINE LEARNING, 2017, 106 (05) : 695 - 712
  • [6] Information-Theoretic Non-redundant Subspace Clustering
    Hubig, Nina
    Plant, Claudia
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT I, 2017, 10234 : 198 - 209
  • [7] Non-redundant multiple clustering by nonnegative matrix factorization
    Sen Yang
    Lijun Zhang
    [J]. Machine Learning, 2017, 106 : 695 - 712
  • [8] CUCG: A non-redundant codon usage database from complete genomes
    Gupta, SK
    Ghosh, TC
    [J]. CURRENT SCIENCE, 2000, 78 (01): : 28 - 29
  • [9] Non-redundant multi-view clustering via orthogonalization
    Cui, Ying
    Fern, Xiaoli Z.
    Dy, Jennifer G.
    [J]. ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 133 - +
  • [10] Non-redundant multi-view clustering based on information bottleneck
    Lou, Zhengzheng
    Ye, Yangdong
    Liu, Ruina
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2013, 50 (09): : 1865 - 1875