Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees

被引:167
|
作者
Xu, Y [1 ]
Olman, V [1 ]
Xu, D [1 ]
机构
[1] Oak Ridge Natl Lab, Prot Informat Grp, Div Life Sci, Oak Ridge, TN 37831 USA
关键词
D O I
10.1093/bioinformatics/18.4.536
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Gene expression data clustering provides a powerful tool for studying functional relationships of genes in a biological process. Identifying correlated expression patterns of genes represents the basic challenge in this clustering problem. Results: This paper describes a new framework for representing a set of multi-dimensional gene expression data as a Minimum Spanning Tree (MST), a concept from the graph theory. A key property of this representation is that each cluster of the expression data corresponds to one subtree of the MST, which rigorously converts a multi-dimensional clustering problem to a tree partitioning problem. We have demonstrated that though the inter-data relationship is greatly simplified in the MST representation, no essential information is lost for the purpose of clustering. Two key advantages in representing a set of multi-dimensional data as an MST are: (1) the simple structure of a tree facilitates efficient implementations of rigorous clustering algorithms, which otherwise are highly computationally challenging; and (2) as an MST-based clustering does not depend on detailed geometric shape of a cluster, it can overcome many of the problems faced by classical clustering algorithms. Based on the MST representation, we have developed a number of rigorous and efficient clustering algorithms, including two with guaranteed global optimality. We have implemented these algorithms as a computer software EXpression data Clustering Analysis and VisualizATiOn Resource (EXCAVATOR), To demonstrate its effectiveness, we have tested it on three data sets, i.e. expression data from yeast Saccharomyces cerevisiae, expression data in response of human fibroblasts to serum, and Arabidopsis expression data in response to chitin elicitation. The test results are highly encouraging. Availability: EXCAVATOR is available on request from the authors. Contact: xyn@ornl.gov.
引用
收藏
页码:536 / 545
页数:10
相关论文
共 50 条
  • [1] CLUSTERING DATA IN CHEMOSYSTEMATICS USING A GRAPH-THEORETIC APPROACH: AN APPLICATION OF MINIMUM SPANNING TREE WITH PENALTY CONCEPT
    Oliveira, L. S.
    Santos, V. C.
    Silva, L.
    Matos, L.
    Cavalcanti, S.
    [J]. BIOMAT 2009, 2010, : 277 - 288
  • [2] An Improved Algorithm for Clustering Gene Expression Data Using Minimum Spanning Trees
    Zhao, Weili
    Zhang, Zhiguo
    [J]. APPLIED MECHANICS AND MECHANICAL ENGINEERING, PTS 1-3, 2010, 29-32 : 2656 - +
  • [3] Clustering gene expression data with memetic algorithms based on minimum spanning trees
    Speer, N
    Merz, P
    Spieth, C
    Zell, A
    [J]. CEC: 2003 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-4, PROCEEDINGS, 2003, : 1848 - 1855
  • [4] QUANTITATIVE-ANALYSIS OF METABOLIC-REGULATION - A GRAPH-THEORETIC APPROACH USING SPANNING-TREES
    SEN, AK
    [J]. BIOCHEMICAL JOURNAL, 1991, 275 : 253 - 258
  • [5] AN OPTIMAL GRAPH-THEORETIC APPROACH TO DATA CLUSTERING - THEORY AND ITS APPLICATION TO IMAGE SEGMENTATION
    WU, Z
    LEAHY, R
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1993, 15 (11) : 1101 - 1113
  • [6] A new graph-theoretic approach to clustering and segmentation
    Pavan, M
    Pelillo, M
    [J]. 2003 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2003, : 145 - 152
  • [7] A graph-theoretic classification of gene expression microarray data of cancer
    Kim, Saejoon
    [J]. PROCEEDINGS OF THE FRONTIERS IN THE CONVERGENCE OF BIOSCIENCE AND INFORMATION TECHNOLOGIES, 2007, : 179 - 182
  • [8] Counting degree sequences of spanning trees in bipartite graphs: A graph-theoretic proof
    Fischer, Anja
    Fischer, Frank
    [J]. JOURNAL OF GRAPH THEORY, 2019, 92 (03) : 230 - 236
  • [9] Caption extraction in videos using graph-theoretic clustering
    Chun, BT
    Han, K
    Lee, J
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IX, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING II, 2002, : 57 - 60
  • [10] Control configuration synthesis using agglomerative hierarchical clustering: A graph-theoretic approach
    Kang, Lixia
    Tang, Wentao
    Liu, Yongzhong
    Daoutidis, Prodromos
    [J]. JOURNAL OF PROCESS CONTROL, 2016, 46 : 43 - 54