Inference for the case probability in high-dimensional logistic regression

被引:0
|
作者
Guo, Zijian [1 ]
Rakshit, Prabrisha [1 ]
Herman, Daniel S. [2 ]
Chen, Jinbo [2 ]
机构
[1] Department of Statistics, Rutgers University, Piscataway,NJ, United States
[2] Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia,PA, United States
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Labeling patients in electronic health records with respect to their statuses of having a disease or condition, i.e. case or control statuses, has increasingly relied on prediction models using high-dimensional variables derived from structured and unstructured electronic health record data. A major hurdle currently is a lack of valid statistical inference methods for the case probability. In this paper, considering high-dimensional sparse logistic regression models for prediction, we propose a novel bias-corrected estimator for the case probability through the development of linearization and variance enhancement techniques. We establish asymptotic normality of the proposed estimator for any loading vector in high dimensions. We construct a confidence interval for the case probability and propose a hypothesis testing procedure for patient case-control labelling. We demonstrate the proposed method via extensive simulation studies and application to real-world electronic health record data. ©2021 Zijian Guo, Prabrisha Rakshit, Daniel Herman and Jinbo Chen.
引用
收藏
相关论文
共 50 条
  • [1] On inference in high-dimensional logistic regression models with separated data
    Lewis, R. M.
    Battey, H. S.
    [J]. BIOMETRIKA, 2024, 111 (03)
  • [2] Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model
    Hyunjin Kim
    Eun Ryung Lee
    Seyoung Park
    [J]. Scientific Reports, 13
  • [3] STATISTICAL INFERENCE FOR GENETIC RELATEDNESS BASED ON HIGH-DIMENSIONAL LOGISTIC REGRESSION
    Ma, Rong
    Guo, Zijian
    Cai, T. Tony
    Li, Hongzhe
    [J]. STATISTICA SINICA, 2024, 34 (02) : 1023 - 1043
  • [4] SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression
    Yadlowsky, Steve
    Yun, Taedong
    McLean, Cory
    D'Amour, Alexander
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [5] Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model
    Kim, Hyunjin
    Lee, Eun Ryung
    Park, Seyoung
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [6] On inference in high-dimensional regression
    Battey, Heather S.
    Reid, Nancy
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2023, 85 (01) : 149 - 175
  • [7] Post-selection Inference of High-dimensional Logistic Regression Under Case-Control Design
    Lin, Yuanyuan
    Xie, Jinhan
    Han, Ruijian
    Tang, Niansheng
    [J]. JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2023, 41 (02) : 624 - 635
  • [8] High-Dimensional Classification by Sparse Logistic Regression
    Abramovich, Felix
    Grinshtein, Vadim
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (05) : 3068 - 3079
  • [9] The Impact of Regularization on High-dimensional Logistic Regression
    Salehi, Fariborz
    Abbasi, Ehsan
    Hassibi, Babak
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [10] Inference for High-Dimensional Censored Quantile Regression
    Fei, Zhe
    Zheng, Qi
    Hong, Hyokyoung G.
    Li, Yi
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (542) : 898 - 912