Multiple outliers detection in sparse high-dimensional regression

被引:13
|
作者
Wang, Tao [1 ,2 ,3 ]
Li, Qun [4 ]
Chen, Bin [5 ]
Li, Zhonghua [1 ,2 ]
机构
[1] Nankai Univ, Inst Stat, Tianjin 300071, Peoples R China
[2] Nankai Univ, LPMC, Tianjin 300071, Peoples R China
[3] Huaiyin Normal Univ, Sch Math Sci, Huaian, Peoples R China
[4] Nankai Univ, Sch Math Sci, Tianjin, Peoples R China
[5] Jiangsu Normal Univ, Sch Math & Stat, Xuzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
High-dimensional linear regression; least trimmed square; multiple hypothesis testing; multiple outliers detection; 62J20; 62H15; 62J05; 62F35; HIGH BREAKDOWN-POINT; LARGE DATA SETS; SQUARES REGRESSION; LINEAR-REGRESSION; INFLUENTIAL OBSERVATIONS; IDENTIFICATION; SCALE;
D O I
10.1080/00949655.2017.1379521
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data.
引用
收藏
页码:89 / 107
页数:19
相关论文
共 50 条
  • [1] Sparse PCA for High-Dimensional Data With Outliers
    Hubert, Mia
    Reynkens, Tom
    Schmitt, Eric
    Verdonck, Tim
    [J]. TECHNOMETRICS, 2016, 58 (04) : 424 - 434
  • [2] Sparse High-Dimensional Isotonic Regression
    Gamarnik, David
    Gaudio, Julia
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] High-Dimensional Classification by Sparse Logistic Regression
    Abramovich, Felix
    Grinshtein, Vadim
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (05) : 3068 - 3079
  • [4] High-Dimensional Sparse Additive Hazards Regression
    Lin, Wei
    Lv, Jinchi
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2013, 108 (501) : 247 - 264
  • [5] Detection of outliers in high-dimensional data using nu-support vector regression
    Mohammed Rashid, Abdullah
    Midi, Habshah
    Dhhan, Waleed
    Arasan, Jayanthi
    [J]. JOURNAL OF APPLIED STATISTICS, 2022, 49 (10) : 2550 - 2569
  • [6] Multiple Change Points Detection in High-Dimensional Multivariate Regression
    MA Xiaoyan
    ZHOU Qin
    ZI Xuemin
    [J]. Journal of Systems Science & Complexity, 2022, 35 (06) : 2278 - 2301
  • [7] Multiple Change Points Detection in High-Dimensional Multivariate Regression
    Xiaoyan Ma
    Qin Zhou
    Xuemin Zi
    [J]. Journal of Systems Science and Complexity, 2022, 35 : 2278 - 2301
  • [8] Multiple Change Points Detection in High-Dimensional Multivariate Regression
    Ma Xiaoyan
    Zhou Qin
    Zi Xuemin
    [J]. JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2022, 35 (06) : 2278 - 2301
  • [9] Cluster PCA for outliers detection in high-dimensional data
    Stefatos, George
    Ben Hamza, A.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3961 - 3966
  • [10] Testing Regression Coefficients in High-Dimensional and Sparse Settings
    Xu, Kai
    Tian, Yan
    Cheng, Qing
    [J]. ACTA MATHEMATICA SINICA-ENGLISH SERIES, 2021, 37 (10) : 1513 - 1532