Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets

被引:20
|
作者
Gupta, Gunjan [1 ]
Liu, Alexander [2 ]
Ghosh, Joydeep [2 ]
机构
[1] Amazon Com, Seattle, WA 98114 USA
[2] Univ Texas Austin, Dept Elect & Comp Engn, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
Mining methods and algorithms; data and knowledge visualization; clustering; bioinformatics; YEAST; CELL;
D O I
10.1109/TCBB.2008.32
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A key application of clustering data obtained from sources such as microarrays, protein mass spectroscopy, and phylogenetic profiles is the detection of functionally related genes. Typically, only a small number of functionally related genes cluster into one or more groups, and the rest need to be ignored. For such situations, we present Automated Hierarchical Density Shaving (Auto-HDS), a framework that consists of a fast hierarchical density-based clustering algorithm and an unsupervised model selection strategy. Auto-HDS can automatically select clusters of different densities, present them in a compact hierarchy, and rank individual clusters using an innovative stability criteria. Our framework also provides a simple yet powerful 2D visualization of the hierarchy of clusters that is useful for further interactive exploration. We present results on Gasch and Lee microarray data sets to show the effectiveness of our methods. Additional results on other biological data are included in the supplemental material.
引用
收藏
页码:223 / 237
页数:15
相关论文
共 50 条
  • [1] Hierarchical density shaving: A clustering and visualization framework for large biological datasets
    Gupta, Gunjan
    Liu, Alexander
    Ghosh, Joydeep
    [J]. ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 89 - +
  • [2] Automated Clustering of Large Data Sets Based on a Topology Representing Graph
    Tasdemir, Kadim
    [J]. 2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 105 - 108
  • [3] An automated robust algorithm for clustering multivariate data
    Vishwakarma, Gajendra K.
    Paul, Chinmoy
    Hadi, Ali S.
    Elsawah, A. M.
    [J]. JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2023, 429
  • [4] Interactive hierarchical displays: a general framework for visualization and exploration of large multivariate data sets
    Yang, J
    Ward, MO
    Rundensteiner, EA
    [J]. COMPUTERS & GRAPHICS-UK, 2003, 27 (02): : 265 - 283
  • [5] Semi-automated clustering of gene expression data sets
    Kim, Minho
    Jung, Ho-Youl
    Chung, Myungguen
    Kim, Pora
    Park, Seon-Hee
    Park, Soo-Jun
    [J]. 2007 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-16, 2007, : 4625 - 4628
  • [6] Automated pharmacophore identification for large chemical data sets
    Chen, X
    Rusinko, A
    Tropsha, A
    Young, SS
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (05): : 887 - 896
  • [7] Automated extraction and parameterization of motions in large data sets
    Kovar, L
    Gleicher, M
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2004, 23 (03): : 559 - 568
  • [8] Gene expression data clustering and visualization based on a binary hierarchical clustering framework
    Szeto, LK
    Liew, AWC
    Yan, H
    Tang, SS
    [J]. JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2003, 14 (04): : 341 - 362
  • [9] Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection
    Campello, Ricardo J. G. B.
    Moulavi, Davoud
    Zimek, Arthur
    Sander, Joerg
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2015, 10 (01)
  • [10] A projection method for robust estimation and clustering in large data sets
    Pena, Daniel
    Prieto, Francisco J.
    [J]. DATA ANALYSIS, CLASSIFICATION AND THE FORWARD SEARCH, 2006, : 209 - +