Multiple outliers detection in sparse high-dimensional regression

被引:13
|
作者
Wang, Tao [1 ,2 ,3 ]
Li, Qun [4 ]
Chen, Bin [5 ]
Li, Zhonghua [1 ,2 ]
机构
[1] Nankai Univ, Inst Stat, Tianjin 300071, Peoples R China
[2] Nankai Univ, LPMC, Tianjin 300071, Peoples R China
[3] Huaiyin Normal Univ, Sch Math Sci, Huaian, Peoples R China
[4] Nankai Univ, Sch Math Sci, Tianjin, Peoples R China
[5] Jiangsu Normal Univ, Sch Math & Stat, Xuzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
High-dimensional linear regression; least trimmed square; multiple hypothesis testing; multiple outliers detection; 62J20; 62H15; 62J05; 62F35; HIGH BREAKDOWN-POINT; LARGE DATA SETS; SQUARES REGRESSION; LINEAR-REGRESSION; INFLUENTIAL OBSERVATIONS; IDENTIFICATION; SCALE;
D O I
10.1080/00949655.2017.1379521
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data.
引用
收藏
页码:89 / 107
页数:19
相关论文
共 50 条
  • [31] Variable selection in high-dimensional sparse multiresponse linear regression models
    Luo, Shan
    [J]. STATISTICAL PAPERS, 2020, 61 (03) : 1245 - 1267
  • [32] NEARLY OPTIMAL MINIMAX ESTIMATOR FOR HIGH-DIMENSIONAL SPARSE LINEAR REGRESSION
    Zhang, Li
    [J]. ANNALS OF STATISTICS, 2013, 41 (04): : 2149 - 2175
  • [33] Online sparse sliced inverse regression for high-dimensional streaming data
    Xu, Jianjun
    Cui, Wenquan
    Cheng, Haoyang
    [J]. INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2023, 21 (02)
  • [34] Minimax Sparse Logistic Regression for Very High-Dimensional Feature Selection
    Tan, Mingkui
    Tsang, Ivor W.
    Wang, Li
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (10) : 1609 - 1622
  • [35] High-dimensional sparse vine copula regression with application to genomic prediction
    Sahin, Oezge
    Czado, Claudia
    [J]. BIOMETRICS, 2024, 80 (01)
  • [36] Robust and sparse estimation methods for high-dimensional linear and logistic regression
    Kurnaz, Fatma Sevinc
    Hoffmann, Irene
    Filzmoser, Peter
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 172 : 211 - 222
  • [37] SPARSE HIGH-DIMENSIONAL REGRESSION: EXACT SCALABLE ALGORITHMS AND PHASE TRANSITIONS
    Bertsimas, Dimitris
    Van Parys, Bart
    [J]. ANNALS OF STATISTICS, 2020, 48 (01): : 300 - 323
  • [38] Variable selection in high-dimensional sparse multiresponse linear regression models
    Shan Luo
    [J]. Statistical Papers, 2020, 61 : 1245 - 1267
  • [39] Efficient Multiple Change Point Detection and Localization For High-Dimensional Quantile Regression with Heteroscedasticity
    Wang, Xianru
    Liu, Bin
    Zhang, Xinsheng
    Liu, Yufeng
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024,
  • [40] High-Dimensional Multiple Bubbles Prediction Based on Sparse Constraints
    Zhang, Heng-Guo
    Wu, Libo
    [J]. IEEE ACCESS, 2019, 7 : 38356 - 38368