Classification of microarray data with penalized logistic regression

被引:37
|
作者
Eilers, PHC [1 ]
Boer, JM [1 ]
van Ommen, GJ [1 ]
van Houwelingen, HC [1 ]
机构
[1] Leiden Univ, Med Ctr, Dept Med Stat, Leiden, Netherlands
关键词
AIC; genetic expression; cross-validation; generalized linear models; multicollinearity; multivariate calibration; ridge regression; singular value decomposition;
D O I
10.1117/12.427987
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Classification of microarray data needs a firm statistical basis. In principle, logistic regression can provide it, modeling the probability of membership of a class with (transforms of) linear combinations of explanatory variables. However, classical logistic regression does not work for microarrays, because generally there will be far more variables than observations. One problem is multicollinearity: estimating equations become singular and have no unique and stable solution. A second problem is over-fitting: a model may fit well to a data set, but perform badly when used to classify new data. We propose penalized likelihood as a solution to both problems. The values of the regression coefficients are constrained in a similar way as in ridge regression. All variables play an equal role., there is no ad-hoc selection of "most relevant" or "most expressed" genes. The dimension of the resulting systems of equations is equal to the number of variables, and generally will be too large for most computers, but it can dramatically be reduced with the singular value decomposition of some matrices. The penalty is optimized with AIC (Akaike's Information Criterion), which essentially is a measure of prediction performance. We find that penalized logistic regression performs well on a public data set (the MIT ALL/AML data).
引用
收藏
页码:187 / 198
页数:12
相关论文
共 50 条
  • [21] Penalized logistic regression for detecting gene interactions
    Park, Mee Young
    Hastie, Trevor
    [J]. BIOSTATISTICS, 2008, 9 (01) : 30 - 50
  • [22] Penalized robust estimators in sparse logistic regression
    Ana M. Bianco
    Graciela Boente
    Gonzalo Chebi
    [J]. TEST, 2022, 31 : 563 - 594
  • [23] Penalized robust estimators in sparse logistic regression
    Bianco, Ana M.
    Boente, Graciela
    Chebi, Gonzalo
    [J]. TEST, 2022, 31 (03) : 563 - 594
  • [24] Model Selection Via Penalized Logistic Regression
    Ayers, Kristin L.
    Cordell, Heather J.
    [J]. GENETIC EPIDEMIOLOGY, 2009, 33 (08) : 770 - 770
  • [25] Comparing two samples by penalized logistic regression
    Fokianos, Konstantinos
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2008, 2 : 564 - 580
  • [26] Logistic regression for disease classification using microarray data:: model selection in a large p and small n case
    Liao, J. G.
    Chin, Khew-Voon
    [J]. BIOINFORMATICS, 2007, 23 (15) : 1945 - 1951
  • [27] LogSum+L2 penalized logistic regression model for biomarker selection and cancer classification
    Liu, Xiao-Ying
    Wu, Sheng-Bing
    Zeng, Wen-Quan
    Yuan, Zhan-Jiang
    Xu, Hong-Bo
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [28] LogSum + L2 penalized logistic regression model for biomarker selection and cancer classification
    Xiao-Ying Liu
    Sheng-Bing Wu
    Wen-Quan Zeng
    Zhan-Jiang Yuan
    Hong-Bo Xu
    [J]. Scientific Reports, 10
  • [29] Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification
    Algamal, Zakariya Yahya
    Lee, Muhammad Hisyam
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (23) : 9326 - 9332
  • [30] Improving Penalized Logistic Regression Model with Missing Values in High-Dimensional Data
    Alharthi, Aiedh Mrisi
    Lee, Muhammad Hisyam
    Algamal, Zakariya Yahya
    [J]. INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2022, 18 (02) : 40 - 54