Integrative clustering of high-dimensional data with joint and individual clusters

被引:21
|
作者
Hellton, Kristoffer H. [1 ,2 ]
Thoresen, Magne [1 ]
机构
[1] Univ Oslo, Dept Biostat, Oslo Ctr Biostat & Epidemiol, N-0317 Oslo, Norway
[2] Univ Oslo, Inst Clin Med, Div Med & Lab Sci, N-1478 Lorenskog, Norway
关键词
Clustering; Integrative genomics; Principal component analysis; Singular value decomposition; BREAST; MODEL;
D O I
10.1093/biostatistics/kxw005
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
When measuring a range of genomic, epigenomic, and transcriptomic variables for the same tissue sample, an integrative approach to analysis can strengthen inference and lead to new insights. This is also the case when clustering patient samples, and several integrative cluster procedures have been proposed. Common for these methodologies is the restriction to a joint cluster structure, equal in all data layers. We instead present a clustering extension of the Joint and Individual Variance Explained algorithm (JIVE), Joint and Individual Clustering (JIC), enabling the construction of both joint and data type-specific clusters simultaneously. The procedure builds on the connection between k-means clustering and principal component analysis, and hence, the number of clusters can be determined by the number of relevant principal components. The proposed procedure is compared with iCluster, a method restricted to only joint clusters, and simulations show that JIC is clearly advantageous when both individual and joint clusters are present. The procedure is illustrated using gene expression and miRNA levels measured in breast cancer tissue from The Cancer Genome Atlas. The analysis suggests a division into three joint clusters common for both data types and two expression-specific clusters.
引用
收藏
页码:537 / 548
页数:12
相关论文
共 50 条
  • [1] Integrative clustering methods for high-dimensional molecular data
    Chalise, Prabhakar
    Koestler, Devin C.
    Bimali, Milan
    Yu, Qing
    Fridley, Brooke L.
    TRANSLATIONAL CANCER RESEARCH, 2014, 3 (03) : 202 - 216
  • [2] Individual Data Protected Integrative Regression Analysis of High-Dimensional Heterogeneous Data
    Cai, Tianxi
    Liu, Molei
    Xia, Yin
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (540) : 2105 - 2119
  • [3] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
  • [4] Clustering High-Dimensional Data
    Masulli, Francesco
    Rovetta, Stefano
    CLUSTERING HIGH-DIMENSIONAL DATA, CHDD 2012, 2015, 7627 : 1 - 13
  • [5] Integrative analysis of individual-level data and high-dimensional summary statistics
    Fu, Sheng
    Deng, Lu
    Zhang, Han
    Qin, Jing
    Yu, Kai
    BIOINFORMATICS, 2023, 39 (04)
  • [6] Clustering of High-Dimensional and Correlated Data
    McLachlan, Geoffrey J.
    Ng, Shu-Kay
    Wang, K.
    DATA ANALYSIS AND CLASSIFICATION, 2010, : 3 - 11
  • [7] Clustering in high-dimensional data spaces
    Murtagh, FD
    STATISTICAL CHALLENGES IN ASTRONOMY, 2003, : 279 - 292
  • [8] Compressive Clustering of High-dimensional Data
    Ruta, Andrzej
    Porikli, Fatih
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 380 - 385
  • [9] Sparse Kernel Clustering of Massive High-Dimensional Data sets with Large Number of Clusters
    Chitta, Radha
    Jain, Anil K.
    Jin, Rong
    PIKM'15: PROCEEDINGS OF THE 8TH PH.D. WORKSHOP IN INFORMATION AND KNOWLEDGE MANAGEMENT, 2015, : 11 - 18
  • [10] An effective clustering scheme for high-dimensional data
    He, Xuansen
    He, Fan
    Fan, Yueping
    Jiang, Lingmin
    Liu, Runzong
    Maalla, Allam
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 45001 - 45045