Classification of microarray data with penalized logistic regression

被引:37
|
作者
Eilers, PHC [1 ]
Boer, JM [1 ]
van Ommen, GJ [1 ]
van Houwelingen, HC [1 ]
机构
[1] Leiden Univ, Med Ctr, Dept Med Stat, Leiden, Netherlands
关键词
AIC; genetic expression; cross-validation; generalized linear models; multicollinearity; multivariate calibration; ridge regression; singular value decomposition;
D O I
10.1117/12.427987
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Classification of microarray data needs a firm statistical basis. In principle, logistic regression can provide it, modeling the probability of membership of a class with (transforms of) linear combinations of explanatory variables. However, classical logistic regression does not work for microarrays, because generally there will be far more variables than observations. One problem is multicollinearity: estimating equations become singular and have no unique and stable solution. A second problem is over-fitting: a model may fit well to a data set, but perform badly when used to classify new data. We propose penalized likelihood as a solution to both problems. The values of the regression coefficients are constrained in a similar way as in ridge regression. All variables play an equal role., there is no ad-hoc selection of "most relevant" or "most expressed" genes. The dimension of the resulting systems of equations is equal to the number of variables, and generally will be too large for most computers, but it can dramatically be reduced with the singular value decomposition of some matrices. The penalty is optimized with AIC (Akaike's Information Criterion), which essentially is a measure of prediction performance. We find that penalized logistic regression performs well on a public data set (the MIT ALL/AML data).
引用
收藏
页码:187 / 198
页数:12
相关论文
共 50 条
  • [1] Penalized logistic regression with prior information for microarray gene expression classification
    Genc, Murat
    [J]. INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2024, 20 (01): : 107 - 122
  • [2] Dimension reduction-based penalized logistic regression for cancer classification using microarray data
    Shen, L
    Tan, EC
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2005, 2 (02) : 166 - 175
  • [3] Classification of gene microarrays by penalized logistic regression
    Zhu, J
    Hastie, T
    [J]. BIOSTATISTICS, 2004, 5 (03) : 427 - 443
  • [4] Classification using partial least squares with penalized logistic regression
    Fort, G
    Lambert-Lacroix, S
    [J]. BIOINFORMATICS, 2005, 21 (07) : 1104 - 1111
  • [5] Improving pattern classification of DNA microarray data by using PCA and logistic regression
    Ocampo-Vega, Ricardo
    Sanchez-Ante, Gildardo
    de Luna, Marco A.
    Vega, Roberto
    Falcon-Morales, Luis E.
    Sossa, Humberto
    [J]. INTELLIGENT DATA ANALYSIS, 2016, 20 : S53 - S67
  • [6] A Penalized Logistic Regression Approach to Detection Based Phone Classification
    Siniscalchi, Sabato Marco
    Svendsen, Torbjorn
    Lee, Chin-Hui
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2390 - 2393
  • [7] Regularized logistic regression without a penalty term: An application to cancer classification with microarray data
    Bielza, Concha
    Robles, Victor
    Larranaga, Pedro
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 5110 - 5118
  • [8] Incorporating Predictor Network in Penalized Regression with Application to Microarray Data
    Pan, Wei
    Xie, Benhuai
    Shen, Xiaotong
    [J]. BIOMETRICS, 2010, 66 (02) : 474 - 484
  • [9] Multiclass-penalized logistic regression
    Nibbering, Didier
    Hastie, Trevor J.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2022, 169
  • [10] Penalized wavelet nonparametric univariate logistic regression for irregular spaced data
    Amato, Umberto
    Antoniadis, Anestis
    De Feis, Italia
    Gijbels, Irene
    [J]. STATISTICS, 2023, 57 (05) : 1037 - 1060