Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model

被引:0
|
作者
Kim, Hyunjin [1 ]
Lee, Eun Ryung [1 ]
Park, Seyoung [1 ]
机构
[1] Sungkyunkwan Univ, Dept Stat, Seoul 100190, South Korea
基金
新加坡国家研究基金会;
关键词
CONFIDENCE-INTERVALS; DRUG-SENSITIVITY; MONOAMINE-OXIDASE; SELECTION; IDENTIFICATION; METAANALYSIS; PREDICTION; SHRINKAGE; SPARSITY; REGIONS;
D O I
10.1038/s41598-023-48903-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Due to the prevalence of complex data, data heterogeneity is often observed in contemporary scientific studies and various applications. Motivated by studies on cancer cell lines, we consider the analysis of heterogeneous subpopulations with binary responses and high-dimensional covariates. In many practical scenarios, it is common to use a single regression model for the entire data set. To do this effectively, it is critical to quantify the heterogeneity of the effect of covariates across subpopulations through appropriate statistical inference. However, the high dimensionality and discrete nature of the data can lead to challenges in inference. Therefore, we propose a novel statistical inference method for a high-dimensional logistic regression model that accounts for heterogeneous subpopulations. Our primary goal is to investigate heterogeneity across subpopulations by testing the equivalence of the effect of a covariate and the significance of the overall effects of a covariate. To achieve overall sparsity of the coefficients and their fusions across subpopulations, we employ a fused group Lasso penalization method. In addition, we develop a statistical inference method that incorporates bias correction of the proposed penalized method. To address computational issues due to the nonlinear log-likelihood and the fused Lasso penalty, we propose a computationally efficient and fast algorithm by adapting the ideas of the proximal gradient method and the alternating direction method of multipliers (ADMM) to our settings. Furthermore, we develop non-asymptotic analyses for the proposed fused group Lasso and prove that the debiased test statistics admit chi-squared approximations even in the presence of high-dimensional variables. In simulations, the proposed test outperforms existing methods. The practical effectiveness of the proposed method is demonstrated by analyzing data from the Cancer Cell Line Encyclopedia (CCLE).
引用
下载
收藏
页数:19
相关论文
共 50 条
  • [1] Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model
    Hyunjin Kim
    Eun Ryung Lee
    Seyoung Park
    Scientific Reports, 13
  • [2] HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS
    Wang, Peiyao
    Li, Quefeng
    Shen, Dinggan
    Liu, Yufeng
    STATISTICA SINICA, 2023, 33 (01) : 27 - 53
  • [3] Debiased Inference on Treatment Effect in a High-Dimensional Model
    Wang, Jingshen
    He, Xuming
    Xu, Gongjun
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (529) : 442 - 454
  • [4] Inference for the case probability in high-dimensional logistic regression
    Guo, Zijian
    Rakshit, Prabrisha
    Herman, Daniel S.
    Chen, Jinbo
    Journal of Machine Learning Research, 2021, 22
  • [5] On inference in high-dimensional logistic regression models with separated data
    Lewis, R. M.
    Battey, H. S.
    BIOMETRIKA, 2024, 111 (03)
  • [6] STATISTICAL INFERENCE FOR GENETIC RELATEDNESS BASED ON HIGH-DIMENSIONAL LOGISTIC REGRESSION
    Ma, Rong
    Guo, Zijian
    Cai, T. Tony
    Li, Hongzhe
    STATISTICA SINICA, 2024, 34 (02) : 1023 - 1043
  • [7] SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression
    Yadlowsky, Steve
    Yun, Taedong
    McLean, Cory
    D'Amour, Alexander
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [8] A MODEL OF DOUBLE DESCENT FOR HIGH-DIMENSIONAL LOGISTIC REGRESSION
    Deng, Zeyu
    Kammoun, Abla
    Thrampoulidis, Christos
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4267 - 4271
  • [9] On inference in high-dimensional regression
    Battey, Heather S.
    Reid, Nancy
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2023, 85 (01) : 149 - 175
  • [10] Sparse and debiased lasso estimation and inference for high-dimensional composite quantile regression with distributed data
    Hou, Zhaohan
    Ma, Wei
    Wang, Lei
    TEST, 2023, 32 (04) : 1230 - 1250