Data-driven cluster analysis method: a novel outliers detection method in multivariate data

被引:0
|
作者
Duarte, A. R. [1 ]
Barbosa, J. J. [1 ]
Martins, H. S. R. [1 ]
Oliveira, F. L. P. [1 ]
机构
[1] Univ Fed Ouro Preto, Stat Dept, Ouro Preto, Brazil
关键词
Data-driven; Multivariate outliers; Cluster analysis; Bayesian information criterion; Accuracy; MAHALANOBIS DISTANCE; IDENTIFICATION;
D O I
10.1080/03610918.2024.2376872
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Detection of multivariate outliers is crucial in statistical studies. On the other hand, the statistical applications without identifying possible outliers may present incorrect results. This study proposes a new technique for detecting multivariate outliers based on cluster analysis. The method considers information inherent in the data itself. We compare the methodology with three detection methods that are already widespread. The comparative investigation considers detection techniques based on the Mahalanobis distance. Sensitivity, specificity, and accuracy measures are used to assess the quality of the methods, as well as an analysis of the CPU time required to carry out the procedures. The new technique revealed a notorious superiority over others.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] A Novel Data-Driven Fault Detection Method Based on Stable Kernel Representation for Dynamic Systems
    Wang, Qiang
    Peng, Bo
    Xie, Pu
    Cheng, Chao
    SENSORS, 2023, 23 (13)
  • [32] A novel data-driven method for fault detection and isolation of control moment gyroscopes onboard satellites
    Muthusamy, Venkatesh
    Kumar, Krishna Dev
    ACTA ASTRONAUTICA, 2021, 180 : 604 - 621
  • [33] A novel fMRI group data analysis method based on data-driven reference extracting from group subjects
    Shi, Yuhu
    Zeng, Weiming
    Wang, Nizhuan
    Chen, Dongtailang
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2015, 122 (03) : 362 - 371
  • [34] A novel data-driven integrated detection method for network intrusion classification based on multi-feature imbalanced data
    Wang, Chia-Hung
    Ye, Qing
    Cai, Jiongbiao
    Suo, Yifan
    Lin, Shengming
    Yuan, Jinchen
    Wu, Xiaojing
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (03) : 5893 - 5910
  • [35] Convergence of a data-driven time-frequency analysis method
    Hou, Thomas Y.
    Shi, Zuoqiang
    Tavallali, Peyman
    APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2014, 37 (02) : 235 - 270
  • [36] A Data-Driven Analysis Method for Spatial Coupling of Renewable Energy
    Liu, Hongli
    Li, XuXia
    Lang, Qingyong
    Li, Kaiying
    Hu, Yingying
    Liang, Yan
    2021 POWER SYSTEM AND GREEN ENERGY CONFERENCE (PSGEC), 2021, : 525 - 530
  • [37] Outliers Detection Method Using Clustering in Buildings Data
    Habib, Usman
    Zucker, Gerhard
    Bloechle, Max
    Judex, Florian
    Haase, Jan
    IECON 2015 - 41ST ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2015, : 694 - 700
  • [38] Bottleneck detection of complex manufacturing systems using a data-driven method
    Li, Lin
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2009, 47 (24) : 6929 - 6940
  • [39] Data-driven Fault Detection and Cause Identification Method for Distribution Systems
    Liu, Shuo
    Liu, Hao
    Bi, Tianshu
    2022 4TH INTERNATIONAL CONFERENCE ON SMART POWER & INTERNET ENERGY SYSTEMS, SPIES, 2022, : 1248 - 1253
  • [40] A Data-driven Method for the Detection of Close Submitters in Online Learning Environments
    Ruiperez-Valiente, Jose A.
    Joksimovic, Srecko
    Kovanovic, Vitomir
    Gasevic, Dragan
    Munoz-Merino, Pedro J.
    Delgado Kloos, Carlos
    WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 361 - 368