High-dimensional variable selection in regression and classification with missing data

被引:6
|
作者
Gao, Qi [1 ]
Lee, Thomas C. M. [1 ]
机构
[1] Univ Calif Davis, Dept Stat, One Shields Ave, Davis, CA 95616 USA
来源
SIGNAL PROCESSING | 2017年 / 131卷
基金
美国国家科学基金会;
关键词
Adaptive lasso; Logistic regression; Low rank recovery; Matrix completion; ADAPTIVE LASSO; LIKELIHOOD; SIGNALS; NOISY; MODEL;
D O I
10.1016/j.sigpro.2016.07.014
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Variable selection for high-dimensional data problems, including both regression and classification, has been a subject of intense research activities in recent years. Many promising solutions have been proposed. However, less attention has been given to the case when some of the data are missing. This paper proposes a general approach to high-dimensional variable selection with the presence of missing data when the missing fraction can be relatively large (e.g., 50%). Both regression and classification are considered. The proposed approach iterates between two major steps: the first step uses matrix completion to impute the missing data while the second step applies adaptive lasso to the imputed data to select the significant variables. Methods are provided for choosing all the involved tuning parameters. As fast algorithms and software are widely available for matrix completion and adaptive lasso, the proposed approach is fast and straightforward to implement. Results from numerical experiments and applications to two real data sets are presented to demonstrate the efficiency and effectiveness of the approach. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 7
页数:7
相关论文
共 50 条
  • [1] Sparse Bayesian variable selection in multinomial probit regression model with application to high-dimensional data classification
    Yang Aijun
    Jiang Xuejun
    Xiang Liming
    Lin Jinguan
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (12) : 6137 - 6150
  • [2] GIBBS POSTERIOR FOR VARIABLE SELECTION IN HIGH-DIMENSIONAL CLASSIFICATION AND DATA MINING
    Jiang, Wenxin
    Tanner, Martin A.
    [J]. ANNALS OF STATISTICS, 2008, 36 (05): : 2207 - 2231
  • [3] A stepwise regression algorithm for high-dimensional variable selection
    Hwang, Jing-Shiang
    Hu, Tsuey-Hwa
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2015, 85 (09) : 1793 - 1806
  • [4] Variable Selection Diagnostics Measures for High-Dimensional Regression
    Nan, Ying
    Yang, Yuhong
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (03) : 636 - 656
  • [5] Variable selection for high-dimensional incomplete data
    Liang, Lixing
    Zhuang, Yipeng
    Yu, Philip L. H.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 192
  • [6] Variable selection for high-dimensional generalized linear model with block-missing data
    He, Yifan
    Feng, Yang
    Song, Xinyuan
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2023, 50 (03) : 1279 - 1297
  • [7] An ensemble learning method for variable selection: application to high-dimensional data and missing values
    Bar-Hen, Avner
    Audigier, Vincent
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2022, 92 (16) : 3488 - 3510
  • [8] High-Dimensional Variable Selection for Survival Data
    Ishwaran, Hemant
    Kogalur, Udaya B.
    Gorodeski, Eiran Z.
    Minn, Andy J.
    Lauer, Michael S.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (489) : 205 - 217
  • [9] SCAD-penalized quantile regression for high-dimensional data analysis and variable selection
    Amin, Muhammad
    Song, Lixin
    Thorlie, Milton Abdul
    Wang, Xiaoguang
    [J]. STATISTICA NEERLANDICA, 2015, 69 (03) : 212 - 235
  • [10] Consistent significance controlled variable selection in high-dimensional regression
    Zambom, Adriano Zanin
    Kim, Jongwook
    [J]. STAT, 2018, 7 (01):