Mixture models with multiple levels, with application to the analysis of multifactor gene expression data

被引:6
|
作者
Joernsten, Rebecka [1 ]
Keles, Suenduez [2 ]
机构
[1] Rutgers State Univ, Dept Stat, Piscataway, NJ 08854 USA
[2] Univ Wisconsin, Dept Stat, Dept Biostat & Med Bioinformat, Madison, WI 53706 USA
关键词
clustering; gene expression; mixture model; model selection; profile expectation-maximization;
D O I
10.1093/biostatistics/kxm051
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Model-based clustering is a popular tool for summarizing high-dimensional data. With the number of high-throughput large-scale gene expression studies still on the rise, the need for effective data- summarizing tools has never been greater. By grouping genes according to a common experimental expression profile, we may gain new insight into the biological pathways that steer biological processes of interest. Clustering of gene profiles can also assist in assigning functions to genes that have not yet been functionally annotated. In this paper, we propose 2 model selection procedures for model-based clustering. Model selection in model-based clustering has to date focused on the identification of data dimensions that are relevant for clustering. However, in more complex data structures, with multiple experimental factors, such an approach does not provide easily interpreted clustering outcomes. We propose a mixture model with multiple levels, MIX(L), that provides sparse representations both "within" and "between" cluster profiles. We explore various flexible "within-cluster" parameterizations and discuss how efficient parameterizations can greatly enhance the objective interpretability of the generated clusters. Moreover, we allow for a sparse "between-cluster" representation with a different number of clusters at different levels of an experimental factor of interest. This enhances interpretability of clusters generated in multiple-factor contexts. Interpretable cluster profiles can assist in detecting biologically relevant groups of genes that may be missed with less efficient parameterizations. We use our multilevel mixture model to mine a proliferating cell line expression data set for annotational context and regulatory motifs. We also investigate the performance of the multilevel clustering approach on several simulated data sets.
引用
收藏
页码:540 / 554
页数:15
相关论文
共 50 条
  • [1] Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data
    Do, K-A.
    McLachlan, G.
    Bean, R.
    Wen, S.
    [J]. CANCER INFORMATICS, 2007, 5 : 25 - 43
  • [2] Clustering of gene expression data by mixture of PCA models
    Yoshioka, T
    Morioka, R
    Kobayashi, K
    Oba, S
    Ogawsawara, N
    Ishii, S
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2002, 2002, 2415 : 522 - 527
  • [3] Hierarchical inverse Gaussian models and multiple testing: Application to gene expression data
    Labbe, A
    Thompson, M
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2005, 4
  • [4] Application of finite mixture models for vehicle crash data analysis
    Park, Byung-Jung
    Lord, Dominique
    [J]. ACCIDENT ANALYSIS AND PREVENTION, 2009, 41 (04): : 683 - 691
  • [5] Data Envelopment Analysis and Multifactor Asset Pricing Models
    Solorzano-Taborga, Pablo
    Belen Alonso-Conde, Ana
    Rojo-Suarez, Javier
    [J]. INTERNATIONAL JOURNAL OF FINANCIAL STUDIES, 2020, 8 (02):
  • [6] A mixture model approach for the analysis of microarray gene expression data
    Allison, DB
    Gadbury, GL
    Heo, MS
    Fernández, JR
    Lee, CK
    Prolla, TA
    Weindruch, R
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 39 (01) : 1 - 20
  • [7] Analysis of Microarray Gene Expression Data Using a Mixture Model
    Bartolucci, Al
    Allison, David B.
    Bae, Sejong
    Singh, Karan P.
    [J]. MODSIM 2007: INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION: LAND, WATER AND ENVIRONMENTAL MANAGEMENT: INTEGRATED SYSTEMS FOR SUSTAINABILITY, 2007, : 2867 - 2869
  • [8] A mixture model approach for the analysis of microarray gene expression data
    Allison, David B.
    Gadbury, Gary L.
    Heo, Moonseong
    Fernández, José R.
    Lee, Cheol-Koo
    Prolla, Tomas A.
    Weindruch, Richard
    [J]. Computational Statistics and Data Analysis, 2002, 38 (05): : 1 - 20
  • [9] Mixture model on the variance for the differential analysis of gene expression data
    Delmar, P
    Robin, S
    Tronik-Le Roux, D
    Daudin, JJ
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2005, 54 : 31 - 50
  • [10] Cluster analysis using multivariate normal mixture models to detect differential gene expression with microarray data
    He, Yi
    Pan, Wei
    Lin, Jizhen
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 51 (02) : 641 - 658