Outlier detection in high-dimensional regression model

被引:14
|
作者
Wang, Tao [1 ,2 ]
Li, Zhonghua [3 ,4 ]
机构
[1] Nankai Univ, Sch Math Sci, Tianjin, Peoples R China
[2] Kashgar Univ, Sch Math & Stat, Kashgar City, Peoples R China
[3] Nankai Univ, Inst Stat, Tianjin 300071, Peoples R China
[4] Nankai Univ, LPMC, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Bootstrap; Cook's distance; distance correlation; high-dimensional; leave-one-out; outlier detection; 62H20; 62J02; 62J05; 62J86; LINEAR-REGRESSION;
D O I
10.1080/03610926.2016.1140783
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods.
引用
收藏
页码:6947 / 6958
页数:12
相关论文
共 50 条
  • [1] Outlier detection for high-dimensional data
    Ro, Kwangil
    Zou, Changliang
    Wang, Zhaojun
    Yin, Guosheng
    BIOMETRIKA, 2015, 102 (03) : 589 - 599
  • [2] Intrinsic dimensional outlier detection in high-dimensional data
    Von Brünken, Jonathan
    Houle, Michael E.
    Zimek, Arthur
    NII Technical Reports, 2015, (03): : 1 - 12
  • [3] Subspace rotations for high-dimensional outlier detection
    Chung, Hee Cheol
    Ahn, Jeongyoun
    JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 183
  • [4] Local projections for high-dimensional outlier detection
    Thomas Ortner
    Peter Filzmoser
    Maia Rohm
    Sarka Brodinova
    Christian Breiteneder
    METRON, 2021, 79 : 189 - 206
  • [5] Efficient Outlier Detection for High-Dimensional Data
    Liu, Huawen
    Li, Xuelong
    Li, Jiuyong
    Zhang, Shichao
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (12): : 2451 - 2461
  • [6] Local projections for high-dimensional outlier detection
    Ortner, Thomas
    Filzmoser, Peter
    Rohm, Maia
    Brodinova, Sarka
    Breiteneder, Christian
    METRON-INTERNATIONAL JOURNAL OF STATISTICS, 2021, 79 (02): : 189 - 206
  • [7] Sparse signal shrinkage and outlier detection in high-dimensional quantile regression with variational Bayes
    Lim, Daeyoung
    Park, Beomjo
    Nott, David
    Wang, Xueou
    Choi, Taeryon
    STATISTICS AND ITS INTERFACE, 2020, 13 (02) : 237 - 249
  • [8] A geometric framework for outlier detection in high-dimensional data
    Herrmann, Moritz
    Pfisterer, Florian
    Scheipl, Fabian
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (03)
  • [9] An effective and efficient algorithm for high-dimensional outlier detection
    Aggarwal, CC
    Yu, PS
    VLDB JOURNAL, 2005, 14 (02): : 211 - 221
  • [10] A Comparison of Outlier Detection Techniques for High-Dimensional Data
    Xu, Xiaodan
    Liu, Huawen
    Li, Li
    Yao, Minghai
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 11 (01) : 652 - 662