Joint Clustering With Correlated Variables

被引:2
|
作者
Zhang, Hongmei [1 ]
Zou, Yubo [2 ]
Terry, Will [1 ]
Karmaus, Wilfried [1 ]
Arshad, Hasan [3 ]
机构
[1] Univ Memphis, Sch Publ Hlth, Memphis, TN 38152 USA
[2] Blue Cross Blue Shield South Carolina, Columbia, SC USA
[3] Univ Southampton, Fac Med, Southampton, Hants, England
来源
AMERICAN STATISTICIAN | 2019年 / 73卷 / 03期
关键词
Bayesian methods; Dirichlet process; Semiparametric modeling; BAYES; MODEL;
D O I
10.1080/00031305.2018.1424033
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Traditional clustering methods focus on grouping subjects or (dependent) variables assuming independence between the variables. Clusters formed through these approaches can potentially lack homogeneity. This article proposes a joint clustering method by which both variables and subjects are clustered. In each joint cluster (in general composed of a subset of variables and a subset of subjects), there exists a unique association between dependent variables and covariates of interest. To this end, a Bayesian method is designed, in which a semi-parametric model is used to evaluate any unknown relationships between possibly correlated variables and covariates of interest, and a Dirichlet process is used to cluster subjects. Compared to existing clustering techniques, the major novelty of the method exists in its ability to improve the homogeneity of clusters, along with the ability to take the correlations between variables into account. Via simulations, we examine the performance and efficiency of the proposed method. Applying the method to cluster allergens and subjects based on the association of wheal size in reaction to allergens with age, we found that a certain pattern of allergic sensitization to a set of allergens has a potential to reduce the occurrence of asthma.
引用
收藏
页码:296 / 306
页数:11
相关论文
共 50 条
  • [31] CLUSTERING OF VARIABLES FOR MIXED DATA
    Saracco, J.
    Chavent, M.
    STATISTICS FOR ASTROPHYSICS: CLUSTERING AND CLASSIFICATION, 2016, 77 : 121 - 169
  • [32] BILINEAR FORMS IN NORMALLY CORRELATED VARIABLES
    CRAIG, AT
    ANNALS OF MATHEMATICAL STATISTICS, 1947, 18 (04): : 565 - 573
  • [33] Statistical tolerance synthesis with correlated variables
    Gonzalez, Isabel
    Sanchez, Ismael
    MECHANISM AND MACHINE THEORY, 2009, 44 (06) : 1097 - 1107
  • [34] ADDITIVELY CORRELATED RANDOM-VARIABLES
    NGUYENVANTHU
    BULLETIN DE L ACADEMIE POLONAISE DES SCIENCES-SERIE DES SCIENCES MATHEMATIQUES ASTRONOMIQUES ET PHYSIQUES, 1975, 23 (07): : 781 - 785
  • [35] DECOMPOSITION OF VARIABLES AND CORRELATED MEASUREMENT ERRORS
    LACH, S
    INTERNATIONAL ECONOMIC REVIEW, 1993, 34 (03) : 715 - 725
  • [36] A note on generating correlated binary variables
    Lunn, AD
    Davies, SJ
    BIOMETRIKA, 1998, 85 (02) : 487 - 490
  • [37] The Median Probability Model and Correlated Variables
    Barbieri, Maria M.
    Berger, James O.
    George, Edward, I
    Rockova, Veronika
    BAYESIAN ANALYSIS, 2021, 16 (04): : 1085 - 1112
  • [38] SIMULATION FOR RISK ANALYSIS WITH CORRELATED VARIABLES
    NGUYEN, VU
    CHOWDHURY, RN
    GEOTECHNIQUE, 1985, 35 (01): : 47 - 58
  • [39] On criteria for factorising correlated variables.
    Dodd, SC
    BIOMETRIKA, 1927, 19 : 45 - 52
  • [40] NOTE ON SAMPLING 2 CORRELATED VARIABLES
    KLEIJNEN, JP
    SIMULATION, 1974, 22 (02) : 45 - 46