Integrative clustering of high-dimensional data with joint and individual clusters

被引:21
|
作者
Hellton, Kristoffer H. [1 ,2 ]
Thoresen, Magne [1 ]
机构
[1] Univ Oslo, Dept Biostat, Oslo Ctr Biostat & Epidemiol, N-0317 Oslo, Norway
[2] Univ Oslo, Inst Clin Med, Div Med & Lab Sci, N-1478 Lorenskog, Norway
关键词
Clustering; Integrative genomics; Principal component analysis; Singular value decomposition; BREAST; MODEL;
D O I
10.1093/biostatistics/kxw005
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
When measuring a range of genomic, epigenomic, and transcriptomic variables for the same tissue sample, an integrative approach to analysis can strengthen inference and lead to new insights. This is also the case when clustering patient samples, and several integrative cluster procedures have been proposed. Common for these methodologies is the restriction to a joint cluster structure, equal in all data layers. We instead present a clustering extension of the Joint and Individual Variance Explained algorithm (JIVE), Joint and Individual Clustering (JIC), enabling the construction of both joint and data type-specific clusters simultaneously. The procedure builds on the connection between k-means clustering and principal component analysis, and hence, the number of clusters can be determined by the number of relevant principal components. The proposed procedure is compared with iCluster, a method restricted to only joint clusters, and simulations show that JIC is clearly advantageous when both individual and joint clusters are present. The procedure is illustrated using gene expression and miRNA levels measured in breast cancer tissue from The Cancer Genome Atlas. The analysis suggests a division into three joint clusters common for both data types and two expression-specific clusters.
引用
收藏
页码:537 / 548
页数:12
相关论文
共 50 条
  • [21] Flexible clustering of high-dimensional data via mixtures of joint generalized hyperbolic distributions
    Tang, Yang
    Browne, Ryan R.
    McNicholas, Paul D.
    STAT, 2018, 7 (01):
  • [22] An efficient clustering method of data mining for high-dimensional data
    Chang, JW
    Kang, HM
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTING TECHNIQUES, 2004, : 273 - 278
  • [23] High-dimensional clustering method for high performance data mining
    Chang, Jae-Woo
    Lee, Hyun-Jo
    COMPUTATIONAL SCIENCE - ICCS 2007, PT 3, PROCEEDINGS, 2007, 4489 : 621 - +
  • [24] Bayesian variable selection in clustering high-dimensional data
    Tadesse, MG
    Sha, N
    Vannucci, M
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (470) : 602 - 617
  • [25] Subspace Clustering of High-Dimensional Data: An Evolutionary Approach
    Vijendra, Singh
    Laxman, Sahoo
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2013, 2013
  • [26] Functional clustering algorithm for high-dimensional proteomics data
    Bensmail, H
    Aruna, B
    Semmes, OJ
    Haoudi, A
    JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2005, (02): : 80 - 86
  • [27] Density Conscious Subspace Clustering for High-Dimensional Data
    Chu, Yi-Hong
    Huang, Jen-Wei
    Chuang, Kun-Ta
    Yang, De-Nian
    Chen, Ming-Syan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (01) : 16 - 30
  • [28] Evolutionary Subspace Clustering Algorithm for High-Dimensional Data
    Nourashrafeddin, S. N.
    Arnold, Dirk V.
    Milios, Evangelos
    PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12), 2012, : 1497 - 1498
  • [29] Subspace clustering of high-dimensional data: a predictive approach
    Brian McWilliams
    Giovanni Montana
    Data Mining and Knowledge Discovery, 2014, 28 : 736 - 772
  • [30] Iterative random projections for high-dimensional data clustering
    Cardoso, Angelo
    Wichert, Andreas
    PATTERN RECOGNITION LETTERS, 2012, 33 (13) : 1749 - 1755