Partial least squares and logistic regression random-effects estimates for gene selection in supervised classification of gene expression data

被引:9
|
作者
Gusnanto, Arief [1 ,2 ]
Ploner, Alexander [2 ]
Shuweihdi, Farag [1 ]
Pawitan, Yudi [2 ]
机构
[1] Univ Leeds, Dept Stat, Leeds LS2 9JT, W Yorkshire, England
[2] Karolinska Inst, Dept Med Epidemiol & Biostat, Stockholm, Sweden
关键词
Supervised classification; Gene selection; Filtering; Partial least squares; Logistic regression; Random effects; MICROARRAY DATA; CANCER CLASSIFICATION; BREAST-CANCER; DISCRIMINATION; BIAS;
D O I
10.1016/j.jbi.2013.05.008
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Our main interest in supervised classification of gene expression data is to infer whether the expressions can discriminate biological characteristics of samples. With thousands of gene expressions to consider, a gene selection has been advocated to decrease classification by including only the discriminating genes. We propose to make the gene selection based on partial least squares and logistic regression random-effects (RE) estimates before the selected genes are evaluated in classification models. We compare the selection with that based on the two-sample t-statistics, a current practice, and modified t-statistics. The results indicate that gene selection based on logistic regression RE estimates is recommended in a general situation, while the selection based on the PLS estimates is recommended when the number of samples is low. Gene selection based on the modified t-statistics performs well when the genes exhibit moderate-to-high variability with moderate group separation. Respecting the characteristics of the data is a key aspect to consider in gene selection. (C) 2013 Elsevier Inc. All rights reserved.
引用
收藏
页码:697 / 709
页数:13
相关论文
共 50 条
  • [1] Tumor classification by partial least squares using microarray gene expression data
    Nguyen, DV
    Rocke, DM
    [J]. BIOINFORMATICS, 2002, 18 (01) : 39 - 50
  • [2] Predicting survival from gene expression data by generalized partial least squares regression
    Storvold, HL
    Lingjaerde, OC
    [J]. BREAST CANCER RESEARCH, 2005, 7 (Suppl 2) : S52 - S52
  • [3] Predicting survival from gene expression data by generalized partial least squares regression
    HL Størvold
    OC Lingjærde
    [J]. Breast Cancer Research, 7
  • [4] Classification using partial least squares with penalized logistic regression
    Fort, G
    Lambert-Lacroix, S
    [J]. BIOINFORMATICS, 2005, 21 (07) : 1104 - 1111
  • [5] Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis
    Liu, Cheng
    Wong, Hau San
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (01) : 312 - 321
  • [6] Partial least squares based dimension reduction with gene selection for tumor classification
    Li, Guo-Zheng
    Zeng, Xue-Qiang
    Yang, Jack Y.
    Yang, Mary Qu
    [J]. PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 1439 - +
  • [7] Kernelized partial least squares for feature reduction and classification of gene microarray data
    Land, Walker H.
    Qiao, Xingye
    Margolis, Daniel E.
    Ford, William S.
    Paquette, Christopher T.
    Perez-Rogers, Joseph F.
    Borgia, Jeffrey A.
    Yang, Jack Y.
    Deng, Youping
    [J]. BMC SYSTEMS BIOLOGY, 2011, 5
  • [8] A Partial Least Squares Algorithm for Microarray Data Analysis Using the VIP Statistic for Gene Selection and Binary Classification
    Burguillo, Francisco J.
    Corchete, Luis A.
    Martin, Javier
    Barrera, Inmaculada
    Bardsley, William G.
    [J]. CURRENT BIOINFORMATICS, 2014, 9 (03) : 348 - 359
  • [9] Missing Values Estimation for Time Course Gene Expression Data Using the Sequential Partial Least Squares Regression Fitting
    Kim, Kyungsook
    Oh, Mira
    Baek, Jangsun
    Son, Young Sook
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2008, 21 (02) : 275 - 290
  • [10] Partial least squares dimension reduction for microarray gene expression data with a censored response
    Nguyen, DV
    [J]. MATHEMATICAL BIOSCIENCES, 2005, 193 (01) : 119 - 137