Copula-Based Anomaly Scoring and Localization for Large-Scale, High-Dimensional Continuous Data

被引:8
|
作者
Horvath, Gabor [1 ]
Kovacs, Edith [2 ,3 ]
Molontay, Roland [4 ,5 ]
Novaczki, Szabolcs [6 ]
机构
[1] Budapest Univ Technol & Econ, Dept Networked Syst & Serv, Magyar Tudosok Krt 2, H-1117 Budapest, Hungary
[2] Univ Debrecen, Fac Informat, Muegyet Rkp 3, H-1111 Budapest, Hungary
[3] Budapest Univ Technol & Econ, Dept Differential Equat, Muegyet Rkp 3, H-1111 Budapest, Hungary
[4] Univ Debrecen, Fac Informat, MTA BME Stochast Res Grp, POB 91, H-1521 Budapest, Hungary
[5] Budapest Univ Technol & Econ, Dept Stochast, POB 91, H-1521 Budapest, Hungary
[6] Bell Labs, Nokia, Bokay Janos Utca 36-42, H-1083 Budapest, Hungary
关键词
Anomaly scoring; unsupervised learning; copula fitting; OUTLIER DETECTION; PROBABILITY-DISTRIBUTIONS; VINES;
D O I
10.1145/3372274
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The anomaly detection method presented by this article has a special feature: it not only indicates whether or not an observation is anomalous but also tells what exactly makes an anomalous observation unusual. Hence, it provides support to localize the reason of the anomaly. The proposed approach is model based; it relies on the multivariate probability distribution associated with the observations. Since the rare events are present in the tails of the probability distributions, we use copula functions, which are able to model the fat-tailed distributions well. The presented procedure scales well; it can cope with a large number of high-dimensional samples. Furthermore, our procedure can cope with missing values as well, which occur frequently in high-dimensional datasets. In the second part of the article, we demonstrate the usability of the method through a case study, where we analyze a large dataset consisting of the performance counters of a real mobile telecommunication network. Since such networks are complex systems, the signs of sub-optimal operation can remain hidden for a potentially long time. With the proposed procedure, many such hidden issues can be isolated and indicated to the network operator.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] High-dimensional copula-based distributions with mixed frequency data
    Oh, Dong Hwan
    Patton, Andrew J.
    [J]. JOURNAL OF ECONOMETRICS, 2016, 193 (02) : 349 - 366
  • [2] High-dimensional copula-based Wasserstein dependence
    De Keyser, Steven
    Gijbels, Irène
    [J]. Computational Statistics and Data Analysis, 2025, 204
  • [3] Visualizing Large-scale and High-dimensional Data
    Tang, Jian
    Liu, Jingzhou
    Zhang, Ming
    Mei, Qiaozhu
    [J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16), 2016, : 287 - 297
  • [4] Monitoring high-dimensional data for failure detection and localization in large-scale computing systems
    Chen, Haifeng
    Jiang, Guofei
    Yoshihira, Kenji
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (01) : 13 - 25
  • [5] Spectral clustering based on iterative optimization for large-scale and high-dimensional data
    Zhao, Yang
    Yuan, Yuan
    Nie, Feiping
    Wang, Qi
    [J]. NEUROCOMPUTING, 2018, 318 : 227 - 235
  • [6] RECURSIVE REDUCTION NET FOR LARGE-SCALE HIGH-DIMENSIONAL DATA
    Ke, Tsung-Wei
    Liu, Tyng-Luh
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 1903 - 1907
  • [7] A Supervised Learning Model for High-Dimensional and Large-Scale Data
    Peng, Chong
    Cheng, Jie
    Cheng, Qiang
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2017, 8 (02)
  • [8] Feature screening with large-scale and high-dimensional survival data
    Yi, Grace Y.
    He, Wenqing
    Carroll, Raymond. J.
    [J]. BIOMETRICS, 2022, 78 (03) : 894 - 907
  • [9] CoDDA: A Flexible Copula-based Distribution Driven Analysis Framework for Large-Scale Multivariate Data
    Hazarika, Subhashis
    Dutta, Soumya
    Shen, Han-Wei
    Chen, Jen-Ping
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2019, 25 (01) : 1214 - 1224
  • [10] Grid-based indexing and search algorithms for large-scale and high-dimensional data
    Yang, Chuanfu
    Li, Zhiyang
    Qu, Wenyu
    Liu, Zhaobin
    Qi, Heng
    [J]. 2017 14TH INTERNATIONAL SYMPOSIUM ON PERVASIVE SYSTEMS, ALGORITHMS AND NETWORKS & 2017 11TH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY & 2017 THIRD INTERNATIONAL SYMPOSIUM OF CREATIVE COMPUTING (ISPAN-FCST-ISCC), 2017, : 46 - 51