Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information

被引:17
|
作者
Navlakha, Saket [1 ,2 ]
White, James [2 ]
Nagarajan, Niranjan [2 ]
Pop, Mihai [1 ,2 ]
Kingsford, Carl [1 ,2 ]
机构
[1] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[2] Univ Maryland, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
基金
美国国家科学基金会;
关键词
clustering; hierarchical tree decompositions; metagenomics; OTUs; protein interaction networks; variation of information; MULTIPLE SEQUENCE ALIGNMENT; PROTEIN; DIVERSITY; NETWORKS; COMPLEXES; ALGORITHM;
D O I
10.1089/cmb.2009.0173
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Hierarchical clustering is a popular method for grouping together similar elements based on a distance measure between them. In many cases, annotations for some elements are known beforehand, which can aid the clustering process. We present a novel approach for decomposing a hierarchical clustering into the clusters that optimally match a set of known annotations, as measured by the variation of information metric. Our approach is general and does not require the user to enter the number of clusters desired. We apply it to two biological domains: finding protein complexes within protein interaction networks and identifying species within metagenomic DNA samples. For these two applications, we test the quality of our clusters by using them to predict complex and species membership, respectively. We find that our approach generally outperforms the commonly used heuristic methods.
引用
收藏
页码:503 / 516
页数:14
相关论文
共 9 条
  • [1] Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information
    Navlakha, Saket
    White, James
    Nagarajan, Niranjan
    Pop, Mihai
    Kingsford, Carl
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS, 2009, 5541 : 400 - +
  • [2] Finding frequent items in data streams using hierarchical information
    Wang, Xiaoyu
    Liu, Hongyan
    Han, Jiawei
    2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 1060 - +
  • [3] Finding the number of clusters in a dataset using an information theoretic hierarchical algorithm
    Aghagolzadeh, M.
    Soltanian-Zadeh, H.
    Araabi, B. N.
    Aghagolzadeh, A.
    2006 13TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS 1-3, 2006, : 1336 - +
  • [4] A hierarchical mixture of Markov models for finding biologically active metabolic paths using gene expression and protein classes
    Mamitsuka, H
    Okuno, Y
    2004 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, 2004, : 341 - 352
  • [5] Self-organising maps for hierarchical tree view document clustering using contextual information
    Freeman, R
    Yin, HJ
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 123 - 128
  • [6] Toward combining thematic information with hierarchical multiscale segmentations using tree Markov random field model
    Zhang, Xueliang
    Xiao, Pengfeng
    Feng, Xuezhi
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2017, 131 : 134 - 146
  • [7] Detection of genetic variation in Indian population groups using a novel minisatellite probe and finding relationships through tree construction
    Saha, A
    Bamezai, R
    JOURNAL OF HUMAN GENETICS, 2000, 45 (04) : 207 - 211
  • [8] Detection of genetic variation in Indian population groups using a novel minisatellite probe and finding relationships through tree construction
    A. Saha
    R. Bamezai
    Journal of Human Genetics, 2000, 45 : 207 - 211
  • [9] Feature Selection Using Maximum Feature Tree Embedded with Mutual Information and Coefficient of Variation for Bird Sound Classification
    Xu, Haifeng
    Zhang, Yan
    Liu, Jiang
    Lv, Danjv
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021