Data-driven cluster analysis method: a novel outliers detection method in multivariate data

被引:0
|
作者
Duarte, A. R. [1 ]
Barbosa, J. J. [1 ]
Martins, H. S. R. [1 ]
Oliveira, F. L. P. [1 ]
机构
[1] Univ Fed Ouro Preto, Stat Dept, Ouro Preto, Brazil
关键词
Data-driven; Multivariate outliers; Cluster analysis; Bayesian information criterion; Accuracy; MAHALANOBIS DISTANCE; IDENTIFICATION;
D O I
10.1080/03610918.2024.2376872
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Detection of multivariate outliers is crucial in statistical studies. On the other hand, the statistical applications without identifying possible outliers may present incorrect results. This study proposes a new technique for detecting multivariate outliers based on cluster analysis. The method considers information inherent in the data itself. We compare the methodology with three detection methods that are already widespread. The comparative investigation considers detection techniques based on the Mahalanobis distance. Sensitivity, specificity, and accuracy measures are used to assess the quality of the methods, as well as an analysis of the CPU time required to carry out the procedures. The new technique revealed a notorious superiority over others.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] ON ROHLFS METHOD FOR THE DETECTION OF OUTLIERS IN MULTIVARIATE DATA
    CARONI, C
    PRESCOTT, P
    JOURNAL OF MULTIVARIATE ANALYSIS, 1995, 52 (02) : 295 - 307
  • [2] Detection of Outliers Method in Grouped Multivariate Data: A Method Based on Multiple Linear Regression
    Phuttisen, Suthat
    Srisodaphol, Wuttichai
    PAKISTAN JOURNAL OF STATISTICS AND OPERATION RESEARCH, 2024, 20 (03) : 445 - 453
  • [3] Detection of outliers in multivariate data: A method based on clustering and robust estimators
    Santos-Pereira, CM
    Pires, AM
    COMPSTAT 2002: PROCEEDINGS IN COMPUTATIONAL STATISTICS, 2002, : 291 - 296
  • [4] A Novel Outlier Detection Method for Multivariate Data
    Almardeny, Yahya
    Boujnah, Noureddine
    Cleary, Frances
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (09) : 4052 - 4062
  • [5] ON THE DETECTION OF MULTIVARIATE DATA OUTLIERS AND REGRESSION OUTLIERS
    LAZRAQ, A
    CLEROUX, R
    DATA ANALYSIS, LEARNING SYMBOLIC AND NUMERIC KNOWLEDGE, 1989, : 133 - 140
  • [6] A data-driven fault propagation analysis method
    Zhou, Funa
    Wen, Chenglin
    Leng, Yuanbao
    Chen, Zhiguo
    Huagong Xuebao/CIESC Journal, 2010, 61 (08): : 1993 - 2001
  • [7] A novel data-driven method for the analysis and reconstruction of cardiac cine MRI
    Groun, Nourelhouda
    Villalba-Orero, Maria
    Lara-Pezzi, Enrique
    Valero, Eusebio
    Garicano-Mena, Jesus
    Le Clainche, Soledad
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 151
  • [8] Data-Driven Method of Fault Detection in Technical Systems
    Zhirabok, Alexey
    Pavlov, Sergey
    25TH DAAAM INTERNATIONAL SYMPOSIUM ON INTELLIGENT MANUFACTURING AND AUTOMATION, 2014, 2015, 100 : 242 - 248
  • [9] A Novel Data-Driven Fault Detection Method Inspired by Parallel Distributed Compensation
    Chen Zhaoxu
    Fang Huajing
    2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 6314 - 6319
  • [10] A Novel Data-Driven Learning Method for Radar Target Detection in Nonstationary Environments
    Akcakaya, Murat
    Sen, Satyabrata
    Nehorai, Arye
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (05) : 762 - 766