A High-Dimensional Outlier Detection Approach Based on Local Coulomb Force

被引:5
|
作者
Zhu, Pengyun [1 ]
Zhang, Chaowei [2 ]
Li, Xiaofeng [1 ]
Zhang, Jifu [1 ]
Qin, Xiao [3 ]
机构
[1] Taiyuan Univ Sci & Technol TYUST, Sch Comp Sci & Technol, Taiyuan 030024, Shanxi, Peoples R China
[2] Yangzhou Univ, Dept Comp Sci, Yangzhou 225127, Jiangsu, Peoples R China
[3] Auburn Univ, Samuel Ginn Coll Engn, Dept Comp Sci & Software Engn, Auburn, AL 36849 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Force; Anomaly detection; Task analysis; Interference; Force measurement; Indexes; Euclidean distance; High-dimensional outlier detection; similarity metric; outlier coulomb resultant force; local outlier coulomb force; neighborhood outlier factor; ALGORITHM;
D O I
10.1109/TKDE.2022.3172167
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional outlier detections are inadequate for high-dimensional data analysis due to the interference of distance tending to be concentrated ("curse of dimensionality"). Inspired by the Coulomb's law, we propose a new high-dimensional data similarity measure vector, which consists of outlier Coulomb force and outlier Coulomb resultant force. Outlier Coulomb force not only effectively gauges similarity measures among data objects, but also fully reflects differences among dimensions of data objects by vector projection in each dimension. More importantly, Coulomb resultant force can effectively measure deviations of data objects from a data center, making detection results interpretable. We introduce a new neighborhood outlier factor, which drives the development of a high-dimensional outlier detection algorithm. In our approach, attribute values with a high deviation degree is treated as interpretable information of outlier data. Finally, we implement and evaluate our algorithm using the UCI and synthetic datasets. Our experimental results show that the algorithm effectively alleviates the interference of "Curse of Dimensionality". The findings confirm that high-dimensional outlier data originated by the algorithm are interpretable.
引用
收藏
页码:5506 / 5520
页数:15
相关论文
共 50 条
  • [1] Local projections for high-dimensional outlier detection
    Thomas Ortner
    Peter Filzmoser
    Maia Rohm
    Sarka Brodinova
    Christian Breiteneder
    METRON, 2021, 79 : 189 - 206
  • [2] Local projections for high-dimensional outlier detection
    Ortner, Thomas
    Filzmoser, Peter
    Rohm, Maia
    Brodinova, Sarka
    Breiteneder, Christian
    METRON-INTERNATIONAL JOURNAL OF STATISTICS, 2021, 79 (02): : 189 - 206
  • [3] An Unbiased Distance-Based Outlier Detection Approach for High-Dimensional Data
    Hoang Vu Nguyen
    Gopalkrishnan, Vivekanand
    Assent, Ira
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT I, 2011, 6587 : 138 - +
  • [4] Outlier detection for high-dimensional data
    Ro, Kwangil
    Zou, Changliang
    Wang, Zhaojun
    Yin, Guosheng
    BIOMETRIKA, 2015, 102 (03) : 589 - 599
  • [5] A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data
    Messaoud, Thouraya Aouled
    Smiti, Abir
    Louati, Aymen
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2019, 2019, 11734 : 322 - 331
  • [6] Thresholding-based outlier detection for high-dimensional data
    Yang, Xiaona
    Wang, Zhaojun
    Zi, Xuemin
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (11) : 2170 - 2184
  • [7] Research on Outlier Detection for High-Dimensional Data Based on PPCLOF
    Chen, Chen
    Luo, Kaiwen
    Min, Lan
    Li, Shenglin
    JOURNAL OF WEB ENGINEERING, 2021, 20 (03): : 743 - 758
  • [8] Intrinsic dimensional outlier detection in high-dimensional data
    Von Brünken, Jonathan
    Houle, Michael E.
    Zimek, Arthur
    NII Technical Reports, 2015, (03): : 1 - 12
  • [9] Subspace rotations for high-dimensional outlier detection
    Chung, Hee Cheol
    Ahn, Jeongyoun
    JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 183
  • [10] Efficient Outlier Detection for High-Dimensional Data
    Liu, Huawen
    Li, Xuelong
    Li, Jiuyong
    Zhang, Shichao
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (12): : 2451 - 2461