Unsupervised Anomaly Detection for Multivariate Incomplete Data using GAN-based Data Imputation: A Comparative Study

被引:0
|
作者
Sarda, Kisan [1 ]
Yerudkar, Amol [2 ]
Del Vecchio, Carmen [1 ]
机构
[1] Univ Sannio, Dept Engn, I-82100 Benevento, Italy
[2] Zhejiang Normal Univ, Sch Math Sci, Jinhua 321004, Zhejiang, Peoples R China
关键词
D O I
10.1109/MED59994.2023.10185791
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the increasing interconnectivity of cyber-physical systems (CPSs) in various fields, such as manufacturing plants, power plants, and smart networked systems, large amounts of multivariate data are generated through sensors and actuators, also other data sources such as measurements and images. This paper focuses on the anomaly detection (AD) problem, also known as fault detection or outlier detection, depending on the type of dataset, which involves identifying anomalous values in the dataset using analytical methods. However, datasets often contain missing values, which can lead to incorrect outcomes and affect the availability of anomalous samples that are fewer in amount, making incomplete datasets. Therefore, a generalized AD method is proposed for incomplete datasets, which involves two steps: data imputation (DI) to obtain complete datasets using GAN and later AD for the complete datasets. While statistical-based imputation methods are commonly used, they do not consider data distribution for datasets with anomalous samples. The capabilities of GAN-based DI are tested under different hyperparameter settings and percentages of missing values. The AD problem is then addressed using seven unsupervised anomaly detection methods on six different datasets, including a real dataset from a steel manufacturing plant in Italy. Each dataset is analyzed to determine which DI and AD method combination performs the best. The results show that GAN-imputed data provides the best DI performance, while the reweighted minimum covariance determinant (RMCD) method offers the overall best AD results combined with GAN.
引用
收藏
页码:55 / 60
页数:6
相关论文
共 50 条
  • [31] Rich Network Anomaly Detection Using Multivariate Data
    Mendiratta, Veena B.
    Thottan, Marina
    [J]. 2017 IEEE 28TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW 2017), 2017, : 48 - 51
  • [32] Anomaly Detection from Incomplete Data
    Liu, Siyuan
    Chen, Lei
    Ni, Lionel M.
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2014, 9 (02)
  • [33] Robustness of a multivariate normal approximation for imputation of incomplete binary data
    Bernaards, Coen A.
    Belin, Thomas R.
    Schafer, Joseph L.
    [J]. STATISTICS IN MEDICINE, 2007, 26 (06) : 1368 - 1382
  • [34] GAN-based data augmentation for transcriptomics: survey and comparative assessment
    Lacan, Alice
    Sebag, Michele
    Hanczar, Blaise
    [J]. BIOINFORMATICS, 2023, 39 : I111 - I120
  • [35] GAN-based data augmentation for transcriptomics: survey and comparative assessment
    Lacan, Alice
    Sebag, Michele
    Hanczar, Blaise
    [J]. BIOINFORMATICS, 2023, 39 : i111 - i120
  • [36] Anomaly Detection on Shuttle data using Unsupervised Learning Techniques
    Shriram, S.
    Sivasankar, E.
    [J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND KNOWLEDGE ECONOMY (ICCIKE' 2019), 2019, : 221 - 225
  • [37] A novel method for unsupervised anomaly detection using unlabelled data
    Ismail, Abdul Samad Bin Haji
    Abdullah, Abdul Hanan
    Abu Bak, Kamalrulnizam Bin
    Bin Ngadi, Md Asri
    Dahlan, Dahliyusmanto
    Chimphlee, Witcha
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCES AND ITS APPLICATIONS, PROCEEDINGS, 2008, : 252 - +
  • [38] An unsupervised anomaly detection approach based on industrial big data
    Zhang, Cong
    Zhu, Yongsheng
    Ren, Zhijun
    Chen, Kaida
    [J]. 2019 2ND WORLD CONFERENCE ON MECHANICAL ENGINEERING AND INTELLIGENT MANUFACTURING (WCMEIM 2019), 2019, : 703 - 709
  • [39] Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning
    Li, Jiaxi
    Wang, Zhelong
    Wu, Lina
    Qiu, Sen
    Zhao, Hongyu
    Lin, Fang
    Zhang, Ke
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (05) : 3102 - 3113
  • [40] Unsupervised Online Anomaly Detection on Multivariate Sensing Time Series Data for Smart Manufacturing
    Hsieh, Ruei-Jie
    Chou, Jerry
    Ho, Chih-Hsiang
    [J]. 2019 IEEE 12TH CONFERENCE ON SERVICE-ORIENTED COMPUTING AND APPLICATIONS (SOCA 2019), 2019, : 90 - 97