Joint Clustering With Correlated Variables

被引:2
|
作者
Zhang, Hongmei [1 ]
Zou, Yubo [2 ]
Terry, Will [1 ]
Karmaus, Wilfried [1 ]
Arshad, Hasan [3 ]
机构
[1] Univ Memphis, Sch Publ Hlth, Memphis, TN 38152 USA
[2] Blue Cross Blue Shield South Carolina, Columbia, SC USA
[3] Univ Southampton, Fac Med, Southampton, Hants, England
来源
AMERICAN STATISTICIAN | 2019年 / 73卷 / 03期
关键词
Bayesian methods; Dirichlet process; Semiparametric modeling; BAYES; MODEL;
D O I
10.1080/00031305.2018.1424033
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Traditional clustering methods focus on grouping subjects or (dependent) variables assuming independence between the variables. Clusters formed through these approaches can potentially lack homogeneity. This article proposes a joint clustering method by which both variables and subjects are clustered. In each joint cluster (in general composed of a subset of variables and a subset of subjects), there exists a unique association between dependent variables and covariates of interest. To this end, a Bayesian method is designed, in which a semi-parametric model is used to evaluate any unknown relationships between possibly correlated variables and covariates of interest, and a Dirichlet process is used to cluster subjects. Compared to existing clustering techniques, the major novelty of the method exists in its ability to improve the homogeneity of clusters, along with the ability to take the correlations between variables into account. Via simulations, we examine the performance and efficiency of the proposed method. Applying the method to cluster allergens and subjects based on the association of wheal size in reaction to allergens with age, we found that a certain pattern of allergic sensitization to a set of allergens has a potential to reduce the occurrence of asthma.
引用
收藏
页码:296 / 306
页数:11
相关论文
共 50 条
  • [21] Impacts of Correlated Input Variables
    Arezki, Saliha
    Boudour, Mohamed
    INTERNATIONAL JOURNAL OF RENEWABLE ENERGY RESEARCH, 2012, 2 (04): : 564 - 573
  • [22] Correlated PLSA for Image Clustering
    Li, Peng
    Cheng, Jian
    Li, Zechao
    Lu, Hanqing
    ADVANCES IN MULTIMEDIA MODELING, PT I, 2011, 6523 : 307 - 316
  • [23] CORRELATED CLUSTERING IN MICROEMULSION MODELS
    SKAF, MS
    STELL, G
    JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1993, 26 (05): : 1051 - 1061
  • [24] A new biplot procedure with joint classification of objects and variables by fuzzy c-means clustering
    Naoto Yamashita
    Shin-ichi Mayekawa
    Advances in Data Analysis and Classification, 2015, 9 : 243 - 266
  • [25] A new biplot procedure with joint classification of objects and variables by fuzzy c-means clustering
    Yamashita, Naoto
    Mayekawa, Shin-ichi
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2015, 9 (03) : 243 - 266
  • [26] Data clustering based probabilistic UPFC allocation for improving power system reliability considering correlated uncertain variables
    Rezaeian-Marjani, Saeed
    Nazarpour, Daryoush
    Galvani, Sadjad
    INTERNATIONAL TRANSACTIONS ON ELECTRICAL ENERGY SYSTEMS, 2021, 31 (12)
  • [27] An Extended Regularized K-Means Clustering Approach for High-Dimensional Customer Segmentation With Correlated Variables
    Zhao, Hong-Hao
    Luo, Xi-Chun
    Ma, Rui
    Lu, Xi
    IEEE ACCESS, 2021, 9 : 48405 - 48412
  • [28] Hierarchical clustering for boxplot variables
    Arroyo, Javier
    Mate, Carlos
    Roque, Antonio Munoz-San
    DATA SCIENCE AND CLASSIFICATION, 2006, : 59 - +
  • [29] Clustering with dendrograms on interpretation variables
    Forina, A
    Armanino, C
    Raggio, V
    ANALYTICA CHIMICA ACTA, 2002, 454 (01) : 13 - 19