A Dirichlet process mixture model for clustering longitudinal gene expression data

被引:9
|
作者
Sun, Jiehuan [1 ]
Herazo-Maya, Jose D. [2 ]
Kaminski, Naftali [2 ]
Zhao, Hongyu [1 ]
Warren, Joshua L. [1 ]
机构
[1] Yale Univ, Dept Biostat, New Haven, CT 06520 USA
[2] Yale Sch Med, Pulm Crit Care & Sleep Med, New Haven, CT 06520 USA
基金
美国国家卫生研究院;
关键词
Bayesian factor analysis; Bayesian nonparametrics; clustering; longitudinal gene expression study; HETEROGENEITY; CANCER;
D O I
10.1002/sim.7374
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Subgroup identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to define subgroups. Longitudinal gene expression profiles might provide additional information on disease progression than what is captured by baseline profiles alone. Therefore, subgroup identification could be more accurate and effective with the aid of longitudinal gene expression data. However, existing statistical methods are unable to fully utilize these data for patient clustering. In this article, we introduce a novel clustering method in the Bayesian setting based on longitudinal gene expression profiles. This method, called BClustLonG, adopts a linear mixed-effects framework to model the trajectory of genes over time, while clustering is jointly conducted based on the regression coefficients obtained from all genes. In order to account for the correlations among genes and alleviate the high dimensionality challenges, we adopt a factor analysis model for the regression coefficients. The Dirichlet process prior distribution is utilized for the means of the regression coefficients to induce clustering. Through extensive simulation studies, we show that BClustLonG has improved performance over other clustering methods. When applied to a dataset of severely injured (burn or trauma) patients, our model is able to identify interesting subgroups. Copyright (c) 2017 John Wiley & Sons, Ltd.
引用
收藏
页码:3495 / 3506
页数:12
相关论文
共 50 条
  • [1] Hierarchical Dirichlet process model for gene expression clustering
    Wang, Liming
    Wang, Xiaodong
    [J]. EURASIP JOURNAL ON BIOINFORMATICS AND SYSTEMS BIOLOGY, 2013, (01)
  • [2] A Spatial Dirichlet Process Mixture Model for Clustering Population Genetics Data
    Reich, Brian J.
    Bondell, Howard D.
    [J]. BIOMETRICS, 2011, 67 (02) : 381 - 390
  • [3] Research on dirichlet process mixture model for clustering
    Zhang, Biyao
    Zhang, Kaisong
    Zhong, Luo
    Zhang, Xuanya
    [J]. Ingenierie des Systemes d'Information, 2019, 24 (02): : 183 - 189
  • [4] Object Clustering With Dirichlet Process Mixture Model for Data Association in Monocular SLAM
    Wei, Songlin
    Chen, Guodong
    Chi, Wenzheng
    Wang, Zhenhua
    Sun, Lining
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2023, 70 (01) : 594 - 603
  • [5] Clustering with label constrained Dirichlet process mixture model
    Burhanuddin, Nurul Afiqah
    Adam, Mohd Bakri
    Ibrahim, Kamarulzaman
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 107
  • [6] Graph Clustering Using Dirichlet Process Mixture Model
    Atastina, Imelda
    Sitohang, Benhard
    Putri, G. A. S.
    Moertini, Veronica S.
    [J]. PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2017,
  • [7] Clustering compositional data using Dirichlet mixture model
    Pal, Samyajoy
    Heumann, Christian
    [J]. PLOS ONE, 2022, 17 (05):
  • [8] Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data
    Prabhakaran, Sandhya
    Azizi, Elham
    Carr, Ambrose
    Pe'er, Dana
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [9] Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data
    Dinari, Or
    Freifeld, Oren
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 818 - 835
  • [10] Dirichlet Process Mixture Models with Pairwise Constraints for Data Clustering
    Li C.
    Rana S.
    Phung D.
    Venkatesh S.
    [J]. Annals of Data Science, 2016, 3 (2) : 205 - 223