Using principal components for estimating logistic regression with high-dimensional multicollinear data

被引:134
|
作者
Aguilera, AM [1 ]
Escabias, M [1 ]
Valderrama, MJ [1 ]
机构
[1] Univ Granada, Dept Stat & OR, Granada, Spain
关键词
logistic regression; multicollinearity; principal components;
D O I
10.1016/j.csda.2005.03.011
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The logistic regression model is used to predict a binary response variable in terms of a set of explicative ones. The estimation of the model parameters is not too accurate and their interpretation in terms of odds ratios may be erroneous, when there is multicollinearity (high dependence) among the predictors. Other important problem is the great number of explicative variables usually needed to explain the response. In order to improve the estimation of the logistic model parameters under multicollinearity and to reduce the dimension of the problem with continuous covariates, it is proposed to use as covariates of the logistic model a reduced set of optimum principal components of the original predictors. Finally, the performance of the proposed principal component logistic regression model is analyzed by developing a simulation study where different methods for selecting the optimum principal components are compared. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:1905 / 1924
页数:20
相关论文
共 50 条
  • [31] Estimating dependency and significance for high-dimensional data
    Siracusa, MR
    Tieu, K
    Ihler, AT
    Fisher, JW
    Willsky, AS
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1085 - 1088
  • [32] Unconditional quantile regression with high-dimensional data
    Sasaki, Yuya
    Ura, Takuya
    Zhang, Yichong
    QUANTITATIVE ECONOMICS, 2022, 13 (03) : 955 - 978
  • [33] Robust Ridge Regression for High-Dimensional Data
    Maronna, Ricardo A.
    TECHNOMETRICS, 2011, 53 (01) : 44 - 53
  • [34] Principal component analysis for sparse high-dimensional data
    Raiko, Tapani
    Ilin, Alexander
    Karhunen, Juha
    NEURAL INFORMATION PROCESSING, PART I, 2008, 4984 : 566 - 575
  • [35] Visualization of high-dimensional data on the probabilistic principal surface
    Chang, KY
    Ghosh, J
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT, VOLS 1 AND 2: INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT IN THE GLOBAL ECONOMY, 2005, : 1315 - 1319
  • [36] Forecasting High-Dimensional Covariance Matrices Using High-Dimensional Principal Component Analysis
    Shigemoto, Hideto
    Morimoto, Takayuki
    AXIOMS, 2022, 11 (12)
  • [37] A modern maximum-likelihood theory for high-dimensional logistic regression
    Sur, Pragya
    Candes, Emmanuel J.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (29) : 14516 - 14525
  • [38] Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models
    Ma, Rong
    Cai, T. Tony
    Li, Hongzhe
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (534) : 984 - 998
  • [39] Introduction to variational Bayes for high-dimensional linear and logistic regression models
    Jang, Insong
    Lee, Kyoungjae
    KOREAN JOURNAL OF APPLIED STATISTICS, 2022, 35 (03) : 445 - 455
  • [40] Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model
    Hyunjin Kim
    Eun Ryung Lee
    Seyoung Park
    Scientific Reports, 13