Outlier detection in high-dimensional regression model

被引:14
|
作者
Wang, Tao [1 ,2 ]
Li, Zhonghua [3 ,4 ]
机构
[1] Nankai Univ, Sch Math Sci, Tianjin, Peoples R China
[2] Kashgar Univ, Sch Math & Stat, Kashgar City, Peoples R China
[3] Nankai Univ, Inst Stat, Tianjin 300071, Peoples R China
[4] Nankai Univ, LPMC, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Bootstrap; Cook's distance; distance correlation; high-dimensional; leave-one-out; outlier detection; 62H20; 62J02; 62J05; 62J86; LINEAR-REGRESSION;
D O I
10.1080/03610926.2016.1140783
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
An outlier is defined as an observation that is significantly different from the others in its dataset. In high-dimensional regression analysis, datasets often contain a portion of outliers. It is important to identify and eliminate the outliers for fitting a model to a dataset. In this paper, a novel outlier detection method is proposed for high-dimensional regression problems. The leave-one-out idea is utilized to construct a novel outlier detection measure based on distance correlation, and then an outlier detection procedure is proposed. The proposed method enjoys several advantages. First, the outlier detection measure can be simply calculated, and the detection procedure works efficiently even for high-dimensional regression data. Moreover, it can deal with a general regression, which does not require specification of a linear regression model. Finally, simulation studies show that the proposed method behaves well for detecting outliers in high-dimensional regression model and performs better than some other competing methods.
引用
收藏
页码:6947 / 6958
页数:12
相关论文
共 50 条
  • [21] Research on Outlier Detection for High-Dimensional Data Based on PPCLOF
    Chen, Chen
    Luo, Kaiwen
    Min, Lan
    Li, Shenglin
    JOURNAL OF WEB ENGINEERING, 2021, 20 (03): : 743 - 758
  • [22] ROBOUT: a conditional outlier detection methodology for high-dimensional data
    Farne, Matteo
    Vouldis, Angelos
    STATISTICAL PAPERS, 2024, 65 (04) : 2489 - 2525
  • [23] High-dimensional model averaging for quantile regression
    Xie, Jinhan
    Ding, Xianwen
    Jiang, Bei
    Yan, Xiaodong
    Kong, Linglong
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2024, 52 (02): : 618 - 635
  • [24] A regression sequences based method for high dimensional outlier detection
    Dafei W.
    Journal of Discrete Mathematical Sciences and Cryptography, 2017, 20 (04) : 931 - 943
  • [25] OUTLIER DETECTION WITH ENHANCED ANGLE-BASED OUTLIER FACTOR IN HIGH-DIMENSIONAL DATA STREAM
    Shou, Zhaoyu
    Tian, Hao
    Li, Simin
    Zou, Fengbo
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2018, 14 (05): : 1633 - 1651
  • [26] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Meng, Guanglei
    Wang, Biao
    Wu, Yanming
    Zhou, Mingzhe
    Meng, Tiankuo
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (11) : 3705 - 3718
  • [27] Unsupervised Artificial Neural Networks for Outlier Detection in High-Dimensional Data
    Popovic, Daniel
    Fouche, Edouard
    Boehm, Klemens
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 11695 : 3 - 19
  • [28] Fast outlier detection for high-dimensional data of wireless sensor networks
    Qiao, Yan
    Cui, Xinhong
    Jin, Peng
    Zhang, Wu
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2020, 16 (10)
  • [29] Ordinal Outlier Algorithm for Anomaly Detection of High-Dimensional Data Sets
    Chen, Gang
    Du, Linlin
    An, Baoran
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 5356 - 5361
  • [30] Variational autoencoder-based outlier detection for high-dimensional data
    Li, Yongmou
    Wang, Yijie
    Ma, Xingkong
    INTELLIGENT DATA ANALYSIS, 2019, 23 (05) : 991 - 1002