Integrative clustering of high-dimensional data with joint and individual clusters

被引:21
|
作者
Hellton, Kristoffer H. [1 ,2 ]
Thoresen, Magne [1 ]
机构
[1] Univ Oslo, Dept Biostat, Oslo Ctr Biostat & Epidemiol, N-0317 Oslo, Norway
[2] Univ Oslo, Inst Clin Med, Div Med & Lab Sci, N-1478 Lorenskog, Norway
关键词
Clustering; Integrative genomics; Principal component analysis; Singular value decomposition; BREAST; MODEL;
D O I
10.1093/biostatistics/kxw005
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
When measuring a range of genomic, epigenomic, and transcriptomic variables for the same tissue sample, an integrative approach to analysis can strengthen inference and lead to new insights. This is also the case when clustering patient samples, and several integrative cluster procedures have been proposed. Common for these methodologies is the restriction to a joint cluster structure, equal in all data layers. We instead present a clustering extension of the Joint and Individual Variance Explained algorithm (JIVE), Joint and Individual Clustering (JIC), enabling the construction of both joint and data type-specific clusters simultaneously. The procedure builds on the connection between k-means clustering and principal component analysis, and hence, the number of clusters can be determined by the number of relevant principal components. The proposed procedure is compared with iCluster, a method restricted to only joint clusters, and simulations show that JIC is clearly advantageous when both individual and joint clusters are present. The procedure is illustrated using gene expression and miRNA levels measured in breast cancer tissue from The Cancer Genome Atlas. The analysis suggests a division into three joint clusters common for both data types and two expression-specific clusters.
引用
收藏
页码:537 / 548
页数:12
相关论文
共 50 条
  • [41] Ensemble Clustering for Boundary Detection in High-Dimensional Data
    Anagnostou, Panagiotis
    Pavlidis, Nicos G.
    Tasoulis, Sotiris
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2023, PT II, 2024, 14506 : 324 - 333
  • [42] Discovering the Skyline of Subspace Clusters in High-Dimensional Data
    Chen, Guanhua
    Ma, Xiuli
    Yang, Dongqing
    Tang, Shiwei
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 439 - +
  • [43] Clustering high-dimensional data using growing SOM
    Zhou, JL
    Fu, Y
    ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 2, PROCEEDINGS, 2005, 3497 : 63 - 68
  • [44] Generalized projected clustering in high-dimensional data streams
    Wang, T
    FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 772 - 778
  • [45] Model based clustering of high-dimensional binary data
    Tang, Yang
    Browne, Ryan P.
    Mc Nicholas, Paul D.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 87 : 84 - 101
  • [46] Self-tuning clustering for high-dimensional data
    Guoqiu Wen
    Yonghua Zhu
    Zhiguo Cai
    Wei Zheng
    World Wide Web, 2018, 21 : 1563 - 1573
  • [47] Subspace Clustering of Very Sparse High-Dimensional Data
    Peng, Hankui
    Pavlidis, Nicos
    Eckley, Idris
    Tsalamanis, Ioannis
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3780 - 3783
  • [48] An efficient clustering method for high-dimensional data mining
    Chang, JW
    Kim, YK
    ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2004, 2004, 3171 : 276 - 285
  • [49] Joint image clustering and feature selection with auto-adjoined learning for high-dimensional data
    Wang, Xiaodong
    Wu, Pengtao
    Xu, Qinghua
    Zeng, Zhiqiang
    Xie, Yong
    KNOWLEDGE-BASED SYSTEMS, 2021, 232
  • [50] High-dimensional integrative copula discriminant analysis for multiomics data
    He, Yong
    Chen, Hao
    Sun, Hao
    Ji, Jiadong
    Shi, Yufeng
    Zhang, Xinsheng
    Liu, Lei
    STATISTICS IN MEDICINE, 2020, 39 (30) : 4869 - 4884