Separating and reintegrating latent variables to improve classification of genomic data

被引:0
|
作者
Payne, Nora Yujia [1 ]
Gagnon-Bartsch, Johann A. [1 ]
机构
[1] Univ Michigan, Dept Stat, 1085 S Univ Ave, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
Classification; Gene expression; Linear discriminant analysis; GENE-EXPRESSION; FEATURE-SELECTION; AIR-POLLUTION; METHYLATION; REGRESSION;
D O I
10.1093/biostatistics/kxab046
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Genomic data sets contain the effects of various unobserved biological variables in addition to the variable of primary interest. These latent variables often affect a large number of features (e.g., genes), giving rise to dense latent variation. This latent variation presents both challenges and opportunities for classification. While some of these latent variables may be partially correlated with the phenotype of interest and thus helpful, others may be uncorrelated and merely contribute additional noise. Moreover, whether potentially helpful or not, these latent variables may obscure weaker effects that impact only a small number of features but more directly capture the signal of primary interest. To address these challenges, we propose the cross-residualization classifier (CRC). Through an adjustment and ensemble procedure, the CRC estimates and residualizes out the latent variation, trains a classifier on the residuals, and then reintegrates the latent variation in a final ensemble classifier. Thus, the latent variables are accounted for without discarding any potentially predictive information. We apply the method to simulated data and a variety of genomic data sets from multiple platforms. In general, we find that the CRC performs well relative to existing classifiers and sometimes offers substantial gains.
引用
收藏
页码:1133 / 1149
页数:17
相关论文
共 50 条
  • [1] Learning causal networks with latent variables from multivariate information in genomic data
    Verny, Louis
    Sella, Nadir
    Affeldt, Severine
    Singh, Param Priya
    Isambert, Herve
    PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (10)
  • [2] Object and Action Classification with Latent Variables
    Bilen, Hakan
    Namboodiri, Vinay P.
    Van Gool, Luc J.
    PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
  • [3] Local Classification of Discrete Variables by Latent Class Models
    Buecker, Michael
    Szepannek, Gero
    Weihs, Claus
    CLASSIFICATION AS A TOOL FOR RESEARCH, 2010, : 127 - 135
  • [4] Joint analysis of semicontinuous data with latent variables
    Wang, Xiaoqing
    Feng, Xiangnan
    Song, Xinyuan
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 151 (151)
  • [5] Latent class model with two latent variables for analysis of count data
    Yamaguchi, Kazunori
    Sakurai, Naoko
    Watanabe, Michiko
    COMPSTAT 2006: PROCEEDINGS IN COMPUTATIONAL STATISTICS, 2006, : 395 - +
  • [6] Predicting hepatocellular carcinoma recurrences: A data-driven multiclass classification method incorporating latent variables
    Xu, Da
    Sheng, Jessica Qiuhua
    Hu, Paul Jen-Hwa
    Huang, Ting Shuo
    Lee, Wei-Chen
    JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 96
  • [7] Latent classification models for binary data
    Langseth, Helge
    Nielsen, Thomas D.
    PATTERN RECOGNITION, 2009, 42 (11) : 2724 - 2736
  • [8] A “Weighted” Geochemical Variable Classification Method Based on Latent Variables
    Jiangtao Liu
    Qiuming Cheng
    Jian-Guo Wang
    Yusen Dong
    Natural Resources Research, 2022, 31 : 1925 - 1941
  • [9] IMPROVING EMOTION CLASSIFICATION THROUGH VARIATIONAL INFERENCE OF LATENT VARIABLES
    Parthasarathy, Srinivas
    Rozgic, Viktor
    Sun, Ming
    Wang, Chao
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7410 - 7414
  • [10] A "Weighted" Geochemical Variable Classification Method Based on Latent Variables
    Liu, Jiangtao
    Cheng, Qiuming
    Wang, Jian-Guo
    Dong, Yusen
    NATURAL RESOURCES RESEARCH, 2022, 31 (04) : 1925 - 1941