Sparse model identification and learning for ultra-high-dimensional additive partially linear models

被引:3
|
作者
Li, Xinyi [1 ]
Wang, Li [2 ]
Nettleton, Dan [2 ]
机构
[1] Univ N Carolina, Dept Stat & Operat Res, SAMSI, Chapel Hill, NC 27709 USA
[2] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
关键词
Dimension reduction; Inference for ultra-high-dimensional data; Semiparametric regression; Spline-backfitted local polynomial; Structure identification; Variable selection; VARIABLE SELECTION; DIVERGING NUMBER;
D O I
10.1016/j.jmva.2019.02.010
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The additive partially linear model (APLM) combines the flexibility of nonparametric regression with the parsimony of regression models, and has been widely used as a popular tool in multivariate nonparametric regression to alleviate the "curse of dimensionality". A natural question raised in practice is the choice of structure in the nonparametric part, i.e., whether the continuous covariates enter into the model in linear or nonparametric form. In this paper, we present a comprehensive framework for simultaneous sparse model identification and learning for ultra-high-dimensional APLMs where both the linear and nonparametric components are possibly larger than the sample size. We propose a fast and efficient two-stage procedure. In the first stage, we decompose the nonparametric functions into a linear part and a nonlinear part. The nonlinear functions are approximated by constant spline bases, and a triple penalization procedure is proposed to select nonzero components using adaptive group LASSO. In the second stage, we refit data with selected covariates using higher order polynomial splines, and apply spline-backfitted local-linear smoothing to obtain asymptotic normality for the estimators. The procedure is shown to be consistent for model structure identification. It can identify zero, linear, and nonlinear components correctly and efficiently. Inference can be made on both linear coefficients and nonparametric functions. We conduct simulation studies to evaluate the performance of the method and apply the proposed method to a dataset on the Shoot Apical Meristem (SAM) of maize genotypes for illustration. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:204 / 228
页数:25
相关论文
共 50 条
  • [31] Sequential Feature Screening for Generalized Linear Models with Sparse Ultra-High Dimensional Data
    ZHANG Junying
    WANG Hang
    ZHANG Riquan
    ZHANG Jiajia
    [J]. Journal of Systems Science & Complexity, 2020, 33 (02) : 510 - 526
  • [32] Sequential Feature Screening for Generalized Linear Models with Sparse Ultra-High Dimensional Data
    Junying Zhang
    Hang Wang
    Riquan Zhang
    Jiajia Zhang
    [J]. Journal of Systems Science and Complexity, 2020, 33 : 510 - 526
  • [33] Sparsity identification for high-dimensional partially linear model with measurement error
    Li, Rui
    Zhao, Haibing
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2018, 47 (08) : 2378 - 2392
  • [34] STATISTICAL INFERENCE IN SPARSE HIGH-DIMENSIONAL ADDITIVE MODELS
    Gregory, Karl
    Mammen, Enno
    Wahl, Martin
    [J]. ANNALS OF STATISTICS, 2021, 49 (03): : 1514 - 1536
  • [35] Double machine learning for partially linear mediation models with high-dimensional confounders
    Yang, Jichen
    Shao, Yujing
    Liu, Jin
    Wang, Lei
    [J]. Neurocomputing, 2025, 614
  • [36] Algorithms for learning sparse additive models with interactions in high dimensions
    Tyagi, Hemant
    Kyrillidis, Anastasios
    Gartner, Bernd
    Krause, Andreas
    [J]. INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2018, 7 (02) : 183 - 249
  • [37] Efficient Sampling for Learning Sparse Additive Models in High Dimensions
    Tyagi, Hemant
    Krause, Andreas
    Gartner, Bernd
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [38] Infinite-Dimensional Sparse Learning in Linear System Identification
    Yin, Mingzhou
    Akan, Mehmet Tolga
    Iannelli, Andrea
    Smith, Roy S.
    [J]. 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 850 - 855
  • [39] Partially linear structure identification in generalized additive models with NP-dimensionality
    Lian, Heng
    Du, Pang
    Li, YuanZhang
    Liang, Hua
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 80 : 197 - 208
  • [40] Uncertainty Quantification for High-Dimensional Sparse Nonparametric Additive Models
    Gao, Qi
    Lai, Randy C. S.
    Lee, Thomas C. M.
    Li, Yao
    [J]. TECHNOMETRICS, 2020, 62 (04) : 513 - 524