Data-driven cluster analysis method: a novel outliers detection method in multivariate data

被引:0
|
作者
Duarte, A. R. [1 ]
Barbosa, J. J. [1 ]
Martins, H. S. R. [1 ]
Oliveira, F. L. P. [1 ]
机构
[1] Univ Fed Ouro Preto, Stat Dept, Ouro Preto, Brazil
关键词
Data-driven; Multivariate outliers; Cluster analysis; Bayesian information criterion; Accuracy; MAHALANOBIS DISTANCE; IDENTIFICATION;
D O I
10.1080/03610918.2024.2376872
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Detection of multivariate outliers is crucial in statistical studies. On the other hand, the statistical applications without identifying possible outliers may present incorrect results. This study proposes a new technique for detecting multivariate outliers based on cluster analysis. The method considers information inherent in the data itself. We compare the methodology with three detection methods that are already widespread. The comparative investigation considers detection techniques based on the Mahalanobis distance. Sensitivity, specificity, and accuracy measures are used to assess the quality of the methods, as well as an analysis of the CPU time required to carry out the procedures. The new technique revealed a notorious superiority over others.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] A Novel Data-Driven Attack Method on Machine Learning Models
    Sadikoglu, Emre
    Kosesoy, Irfan
    Gok, Murat
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2024, 30 (03) : 402 - 417
  • [22] Novel subgroups of obesity and their association with outcomes: a data-driven cluster analysis
    Takeshita, Saki
    Nishioka, Yuichi
    Tamaki, Yuko
    Kamitani, Fumika
    Mohri, Takako
    Nakajima, Hiroki
    Kurematsu, Yukako
    Okada, Sadanori
    Myojin, Tomoya
    Noda, Tatsuya
    Imamura, Tomoaki
    Takahashi, Yutaka
    BMC PUBLIC HEALTH, 2024, 24 (01)
  • [23] Novel subgroups of obesity and their association with outcomes: a data-driven cluster analysis
    Saki Takeshita
    Yuichi Nishioka
    Yuko Tamaki
    Fumika Kamitani
    Takako Mohri
    Hiroki Nakajima
    Yukako Kurematsu
    Sadanori Okada
    Tomoya Myojin
    Tatsuya Noda
    Tomoaki Imamura
    Yutaka Takahashi
    BMC Public Health, 24
  • [24] Data-Driven Method for Missing Harmonic Data Completion
    Xu, Rui
    Ma, Xiaoyang
    Zhou, Runze
    Zhao, Jinshuai
    Wang, Ying
    IEEE ACCESS, 2021, 9 : 164037 - 164046
  • [25] A new data-driven method for microarray data classification
    Pugalendhi, Ganeshkumar
    Vijayakumar, Ammu
    Kim, Ku-Jin
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 15 (02) : 101 - 124
  • [26] A data-driven method for dissipative thermomechanics
    Ruiz, D.
    Portillo, D.
    Romero, I
    IFAC PAPERSONLINE, 2021, 54 (19): : 315 - 320
  • [27] A data-driven hybrid sensor fault detection/diagnosis method with flight test data
    Song, Jinsheng
    Chen, Ziqiao
    Wang, Dong
    Wen, Xin
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (07)
  • [28] Subspace Method Aided Data-Driven Fault Detection Based on Principal Component Analysis
    Ma L.
    Li X.
    Li, Xiangshun (lixiangshun@whut.edu.cn), 1600, Hindawi Limited, 410 Park Avenue, 15th Floor, 287 pmb, New York, NY 10022, United States (2017):
  • [29] A Novel Data-Driven Analysis Method for Electromagnetic Radiations Based on Dynamic Mode Decomposition
    Zhang, Yanming
    Jiang, Lijun
    IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY, 2020, 62 (04) : 1443 - 1450
  • [30] A Novel Unsupervised Data-Driven Method for Electricity Theft Detection in AMI Using Observer Meters
    Qi, Ruobin
    Zheng, Jun
    Luo, Zhirui
    Li, Qingqing
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71