A Composite Model for Subgroup Identification and Prediction via Bicluster Analysis

被引:2
|
作者
Chen, Hung-Chia [1 ,2 ,3 ]
Zou, Wen [1 ]
Lu, Tzu-Pin [1 ,4 ]
Chen, James J. [1 ,2 ,3 ]
机构
[1] US FDA, Natl Ctr Toxicol Res, Div Bioinformat & Biostat, Jefferson, AR 72079 USA
[2] China Med Univ, Grad Inst Biostat, Taichung, Taiwan
[3] China Med Univ, Ctr Biostat, Taichung, Taiwan
[4] Natl Taiwan Univ, Grad Inst Epidemiol & Prevent Med, Dept Publ Hlth, Taipei 10764, Taiwan
来源
PLOS ONE | 2014年 / 9卷 / 10期
关键词
FIELD GEL-ELECTROPHORESIS; HIGH-DIMENSIONAL DATA; SINGULAR-VALUE DECOMPOSITION; GENE-EXPRESSION DATA; ACUTE LYMPHOBLASTIC-LEUKEMIA; SALMONELLA-SEROTYPES; CLUSTER-ANALYSIS; MICROARRAY DATA; CLASSIFICATION; PATTERNS;
D O I
10.1371/journal.pone.0111318
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: A major challenges in the analysis of large and complex biomedical data is to develop an approach for 1) identifying distinct subgroups in the sampled populations, 2) characterizing their relationships among subgroups, and 3) developing a prediction model to classify subgroup memberships of new samples by finding a set of predictors. Each subgroup can represent different pathogen serotypes of microorganisms, different tumor subtypes in cancer patients, or different genetic makeups of patients related to treatment response. Methods: This paper proposes a composite model for subgroup identification and prediction using biclusters. A biclustering technique is first used to identify a set of biclusters from the sampled data. For each bicluster, a subgroup-specific binary classifier is built to determine if a particular sample is either inside or outside the bicluster. A composite model, which consists of all binary classifiers, is constructed to classify samples into several disjoint subgroups. The proposed composite model neither depends on any specific biclustering algorithm or patterns of biclusters, nor on any classification algorithms. Results: The composite model was shown to have an overall accuracy of 97.4% for a synthetic dataset consisting of four subgroups. The model was applied to two datasets where the sample's subgroup memberships were known. The procedure showed 83.7% accuracy in discriminating lung cancer adenocarcinoma and squamous carcinoma subtypes, and was able to identify 5 serotypes and several subtypes with about 94% accuracy in a pathogen dataset. Conclusion: The composite model presents a novel approach to developing a biclustering-based classification model from unlabeled sampled data. The proposed approach combines unsupervised biclustering and supervised classification techniques to classify samples into disjoint subgroups based on their associated attributes, such as genotypic factors, phenotypic outcomes, efficacy/safety measures, or responses to treatments. The procedure is useful for identification of unknown species or new biomarkers for targeted therapy.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] MODEL APPROXIMATIONS VIA PREDICTION ERROR IDENTIFICATION
    ANDERSON, BDO
    MOORE, JB
    HAWKES, RM
    [J]. AUTOMATICA, 1978, 14 (06) : 615 - 622
  • [2] Bicluster Analysis of Heterogeneous Panel Data via M-Estimation
    Cui, Weijie
    Li, Yong
    [J]. MATHEMATICS, 2023, 11 (10)
  • [3] Subgroup identification in dose-finding trials via model-based recursive partitioning
    Thomas, Marius
    Bornkamp, Bjoern
    Seibold, Heidi
    [J]. STATISTICS IN MEDICINE, 2018, 37 (10) : 1608 - 1624
  • [4] Subgroup Analysis via Recursive Partitioning
    Su, Xiaogang
    Tsai, Chih-Ling
    Wang, Hansheng
    Nickerson, David M.
    Li, Bogong
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 141 - 158
  • [5] Subgroup Analysis for Longitudinal Data via Semiparametric Additive Mixed Effects Model
    Xiaolin Bo
    Weiping Zhang
    [J]. Journal of Systems Science and Complexity, 2023, 36 : 2155 - 2185
  • [6] Subgroup Analysis for Longitudinal Data via Semiparametric Additive Mixed Effects Model
    BO Xiaolin
    ZHANG Weiping
    [J]. Journal of Systems Science & Complexity, 2023, 36 (05) : 2155 - 2185
  • [7] Subgroup Analysis for Longitudinal Data via Semiparametric Additive Mixed Effects Model
    Bo, Xiaolin
    Zhang, Weiping
    [J]. JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2023, 36 (05) : 2155 - 2185
  • [8] Change plane model averaging for subgroup identification
    Liu, Pan
    Li, Jialiang
    Kosorok, Michael R.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2023, 32 (04) : 773 - 788
  • [9] A new measles epidemic model: analysis, identification and prediction
    Di Giamberardino, Paolo
    Iacoviello, Daniela
    [J]. 2020 28TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2020, : 484 - 489
  • [10] Subgroup causal effect identification and estimation via matching tree
    Zhang, Yuyang
    Schnell, Patrick
    Song, Chi
    Huang, Bin
    Lu, Bo
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 159 (159)