Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees

被引:167
|
作者
Xu, Y [1 ]
Olman, V [1 ]
Xu, D [1 ]
机构
[1] Oak Ridge Natl Lab, Prot Informat Grp, Div Life Sci, Oak Ridge, TN 37831 USA
关键词
D O I
10.1093/bioinformatics/18.4.536
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Gene expression data clustering provides a powerful tool for studying functional relationships of genes in a biological process. Identifying correlated expression patterns of genes represents the basic challenge in this clustering problem. Results: This paper describes a new framework for representing a set of multi-dimensional gene expression data as a Minimum Spanning Tree (MST), a concept from the graph theory. A key property of this representation is that each cluster of the expression data corresponds to one subtree of the MST, which rigorously converts a multi-dimensional clustering problem to a tree partitioning problem. We have demonstrated that though the inter-data relationship is greatly simplified in the MST representation, no essential information is lost for the purpose of clustering. Two key advantages in representing a set of multi-dimensional data as an MST are: (1) the simple structure of a tree facilitates efficient implementations of rigorous clustering algorithms, which otherwise are highly computationally challenging; and (2) as an MST-based clustering does not depend on detailed geometric shape of a cluster, it can overcome many of the problems faced by classical clustering algorithms. Based on the MST representation, we have developed a number of rigorous and efficient clustering algorithms, including two with guaranteed global optimality. We have implemented these algorithms as a computer software EXpression data Clustering Analysis and VisualizATiOn Resource (EXCAVATOR), To demonstrate its effectiveness, we have tested it on three data sets, i.e. expression data from yeast Saccharomyces cerevisiae, expression data in response of human fibroblasts to serum, and Arabidopsis expression data in response to chitin elicitation. The test results are highly encouraging. Availability: EXCAVATOR is available on request from the authors. Contact: xyn@ornl.gov.
引用
收藏
页码:536 / 545
页数:10
相关论文
共 50 条
  • [21] A graph-theoretical clustering method based on two rounds of minimum spanning trees
    Zhong, Caiming
    Miao, Duoqian
    Wang, Ruizhi
    [J]. PATTERN RECOGNITION, 2010, 43 (03) : 752 - 766
  • [22] GRAPH-THEORETIC APPROACH TO GOODNESS-OF-FIT IN COMPLETE-LINK HIERARCHICAL CLUSTERING
    BAKER, FB
    HUBERT, LJ
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1976, 71 (356) : 870 - 878
  • [23] An approach of feature selection using graph-theoretic heuristic and hill climbing
    Saptarsi Goswami
    Amit Kumar Das
    Priyanka Guha
    Arunabha Tarafdar
    Sanjay Chakraborty
    Amlan Chakrabarti
    Basabi Chakraborty
    [J]. Pattern Analysis and Applications, 2019, 22 : 615 - 631
  • [24] Multi-layer assignment steganography using graph-theoretic approach
    Hanzhou Wu
    Hongxia Wang
    Hong Zhao
    Xiuying Yu
    [J]. Multimedia Tools and Applications, 2015, 74 : 8171 - 8196
  • [25] POWER SYSTEM TOPOLOGICAL OBSERVABILITY USING A DIRECT GRAPH-THEORETIC APPROACH
    QUINTANA, VH
    SIMOESCOSTA, A
    MANDEL, A
    [J]. IEEE TRANSACTIONS ON POWER APPARATUS AND SYSTEMS, 1982, 101 (03): : 617 - 626
  • [26] Multi-layer assignment steganography using graph-theoretic approach
    Wu, Hanzhou
    Wang, Hongxia
    Zhao, Hong
    Yu, Xiuying
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (18) : 8171 - 8196
  • [27] An approach of feature selection using graph-theoretic heuristic and hill climbing
    Goswami, Saptarsi
    Das, Amit Kumar
    Guha, Priyanka
    Tarafdar, Arunabha
    Chakraborty, Sanjay
    Chakrabarti, Amlan
    Chakraborty, Basabi
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2019, 22 (02) : 615 - 631
  • [28] On Wavelet based Modeling of Neural Networks using Graph-theoretic Approach
    Bhosale, B.
    [J]. 20TH INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION (MODSIM2013), 2013, : 712 - 718
  • [29] A graph-theoretic approach to analyzing knowledge bases containing rules, models and data
    Amit Basu
    Robert W. Blanning
    [J]. Annals of Operations Research, 1997, 75 : 3 - 23
  • [30] A novel PSO-based graph-theoretic approach for identifying most relevant and non-redundant gene markers from gene expression data
    Mandal, Monalisa
    Mukhopadhyay, Anirban
    [J]. INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2015, 30 (03) : 175 - 192