Joint Clustering With Correlated Variables

被引:2
|
作者
Zhang, Hongmei [1 ]
Zou, Yubo [2 ]
Terry, Will [1 ]
Karmaus, Wilfried [1 ]
Arshad, Hasan [3 ]
机构
[1] Univ Memphis, Sch Publ Hlth, Memphis, TN 38152 USA
[2] Blue Cross Blue Shield South Carolina, Columbia, SC USA
[3] Univ Southampton, Fac Med, Southampton, Hants, England
来源
AMERICAN STATISTICIAN | 2019年 / 73卷 / 03期
关键词
Bayesian methods; Dirichlet process; Semiparametric modeling; BAYES; MODEL;
D O I
10.1080/00031305.2018.1424033
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Traditional clustering methods focus on grouping subjects or (dependent) variables assuming independence between the variables. Clusters formed through these approaches can potentially lack homogeneity. This article proposes a joint clustering method by which both variables and subjects are clustered. In each joint cluster (in general composed of a subset of variables and a subset of subjects), there exists a unique association between dependent variables and covariates of interest. To this end, a Bayesian method is designed, in which a semi-parametric model is used to evaluate any unknown relationships between possibly correlated variables and covariates of interest, and a Dirichlet process is used to cluster subjects. Compared to existing clustering techniques, the major novelty of the method exists in its ability to improve the homogeneity of clusters, along with the ability to take the correlations between variables into account. Via simulations, we examine the performance and efficiency of the proposed method. Applying the method to cluster allergens and subjects based on the association of wheal size in reaction to allergens with age, we found that a certain pattern of allergic sensitization to a set of allergens has a potential to reduce the occurrence of asthma.
引用
收藏
页码:296 / 306
页数:11
相关论文
共 50 条
  • [1] Correlated variables in regression: Clustering and sparse estimation
    Buehlmann, Peter
    Ruetimann, Philipp
    van de Geer, Sara
    Zhang, Cun-Hui
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2013, 143 (11) : 1835 - 1858
  • [2] Discussion of "Correlated variables in regression: Clustering and sparse estimation"
    Bien, Jacob
    Wegkamp, Marten
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2013, 143 (11) : 1859 - 1862
  • [3] Generating correlated binary variables with complete specification of the joint distribution
    Kang, SH
    Jung, SH
    BIOMETRICAL JOURNAL, 2001, 43 (03) : 263 - 269
  • [4] Joint confidence regions for correlated multidimensional environmental control variables{private}
    Otto, GH
    Scott, DW
    AMERICAN STATISTICAL ASSOCIATION - 1996 PROCEEDINGS OF THE SECTION ON STATISTICS AND THE ENVIRONMENT, 1996, : 71 - 76
  • [5] Bayesian Nonparametric Joint Mixture Model for Clustering Spatially Correlated Time Series
    Lee, Youngmin
    Kim, Heeyoung
    TECHNOMETRICS, 2020, 62 (03) : 313 - 329
  • [6] Joint specific and correlated information exploration for multi-view action clustering
    Hu, Shizhe
    Yan, Xiaoqiang
    Ye, Yangdong
    INFORMATION SCIENCES, 2020, 524 (524) : 148 - 164
  • [7] Patterns of cleaning product exposures using a novel clustering approach for data with correlated variables
    Marbac, Matthieu
    Sedki, Mohammed
    Boutron-Ruault, Marie-Christine
    Dumas, Orianne
    ANNALS OF EPIDEMIOLOGY, 2018, 28 (08) : 563 - 569
  • [8] Clustering of correlated networks
    Dorogovtsev, SN
    PHYSICAL REVIEW E, 2004, 69 (02) : 027104 - 1
  • [9] Clustering random variables
    Hathaway, RJ
    IETE JOURNAL OF RESEARCH, 1998, 44 (4-5) : 199 - 205
  • [10] Supervised clustering of variables
    Chen, Mingkun
    Vigneau, Evelyne
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2016, 10 (01) : 85 - 101