Outlier detection in high-dimensional regression model

被引:14
|
作者
Wang, Tao [1 ,2 ]
Li, Zhonghua [3 ,4 ]
机构
[1] Nankai Univ, Sch Math Sci, Tianjin, Peoples R China
[2] Kashgar Univ, Sch Math & Stat, Kashgar City, Peoples R China
[3] Nankai Univ, Inst Stat, Tianjin 300071, Peoples R China
[4] Nankai Univ, LPMC, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Bootstrap; Cook's distance; distance correlation; high-dimensional; leave-one-out; outlier detection; 62H20; 62J02; 62J05; 62J86; LINEAR-REGRESSION;
D O I
10.1080/03610926.2016.1140783
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods.
引用
收藏
页码:6947 / 6958
页数:12
相关论文
共 50 条
  • [31] OUTLIER DETECTION BASED ON DENSITY OF HYPERCUBE IN HIGH-DIMENSIONAL DATA STREAM
    Shou, Zhaoyu
    Zou, Fengbo
    Li, Simin
    Lu, Xianying
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (03): : 873 - 889
  • [32] A High-Dimensional Outlier Detection Approach Based on Local Coulomb Force
    Zhu, Pengyun
    Zhang, Chaowei
    Li, Xiaofeng
    Zhang, Jifu
    Qin, Xiao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (06) : 5506 - 5520
  • [33] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Guanglei Meng
    Biao Wang
    Yanming Wu
    Mingzhe Zhou
    Tiankuo Meng
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 3705 - 3718
  • [34] Multiple outliers detection in sparse high-dimensional regression
    Wang, Tao
    Li, Qun
    Chen, Bin
    Li, Zhonghua
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (01) : 89 - 107
  • [35] A systematic review on model selection in high-dimensional regression
    Lee, Eun Ryung
    Cho, Jinwoo
    Yu, Kyusang
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2019, 48 (01) : 1 - 12
  • [36] A Model Selection Criterion for High-Dimensional Linear Regression
    Owrang, Arash
    Jansson, Magnus
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (13) : 3436 - 3446
  • [37] A MODEL OF DOUBLE DESCENT FOR HIGH-DIMENSIONAL LOGISTIC REGRESSION
    Deng, Zeyu
    Kammoun, Abla
    Thrampoulidis, Christos
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4267 - 4271
  • [38] A systematic review on model selection in high-dimensional regression
    Eun Ryung Lee
    Jinwoo Cho
    Kyusang Yu
    Journal of the Korean Statistical Society, 2019, 48 : 1 - 12
  • [39] Jackknife model averaging for high-dimensional quantile regression
    Wang, Miaomiao
    Zhang, Xinyu
    Wan, Alan T. K.
    You, Kang
    Zou, Guohua
    BIOMETRICS, 2023, 79 (01) : 178 - 189
  • [40] A Model-Averaging Approach for High-Dimensional Regression
    Ando, Tomohiro
    Li, Ker-Chau
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (505) : 254 - 265