A hybrid dimensionality reduction method for outlier detection in high-dimensional data

被引:1
|
作者
Meng, Guanglei [1 ]
Wang, Biao [1 ]
Wu, Yanming [1 ]
Zhou, Mingzhe [1 ]
Meng, Tiankuo [1 ]
机构
[1] Shenyang Aerosp Univ, Sch Automat, Shenyang 110136, Peoples R China
基金
美国国家科学基金会;
关键词
Outlier detection; Anomaly detection; Dimensionality reduction; High-dimensional data; Ensemble learning; FEATURE-EXTRACTION; ENSEMBLE; PCA;
D O I
10.1007/s13042-023-01859-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection becomes challenging when data are featured by high-dimension. Using dimensionality reduction (DR) techniques to discard the irrelevant attributes is a straightforward solution. However, it appears to be rather difficult for single DR algorithm to discover all outliers, owing to the rarity, heterogeneity, and boundless nature of outliers. In this paper, we propose a hybrid DR method dedicated to outlier detection base on ensemble learning. Multiple algorithms with different specifications of parameters are used to generate accurate and diverse base detectors at the phase of ensemble generation. A two-stage combination function is used at the phase of ensemble combination. Both variance reduction and bias reduction are taken into account in our framework. More importantly, the high flexibility of the proposed detection framework implies that any outlier detection algorithm can be applicable. 15 high-dimensional data sets from KEEL repository and one image data set are used to validate the performance of our method. One semi-supervised and one unsupervised outlier detection algorithms are used in separate experiments. In spite of subtle differences, the advantage of our method has been approved by both experiments. Moreover, contributions of two ingredients of our method are also verified via two pairs of experimental comparisons.
引用
收藏
页码:3705 / 3718
页数:14
相关论文
共 50 条
  • [1] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Guanglei Meng
    Biao Wang
    Yanming Wu
    Mingzhe Zhou
    Tiankuo Meng
    [J]. International Journal of Machine Learning and Cybernetics, 2023, 14 : 3705 - 3718
  • [2] Outlier detection for high-dimensional data
    Ro, Kwangil
    Zou, Changliang
    Wang, Zhaojun
    Yin, Guosheng
    [J]. BIOMETRIKA, 2015, 102 (03) : 589 - 599
  • [3] Hybrid Dimensionality Reduction Forest With Pruning for High-Dimensional Data Classification
    Chen, Weihong
    Xu, Yuhong
    Yu, Zhiwen
    Cao, Wenming
    Chen, C. L. Philip
    Han, Guoqiang
    [J]. IEEE ACCESS, 2020, 8 : 40138 - 40150
  • [4] Efficient Outlier Detection for High-Dimensional Data
    Liu, Huawen
    Li, Xuelong
    Li, Jiuyong
    Zhang, Shichao
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (12): : 2451 - 2461
  • [5] Dimensionality reduction for visualizing high-dimensional biological data
    Malepathirana, Tamasha
    Senanayake, Damith
    Vidanaarachchi, Rajith
    Gautam, Vini
    Halgamuge, Saman
    [J]. BIOSYSTEMS, 2022, 220
  • [6] Dimensionality Reduction for Registration of High-Dimensional Data Sets
    Xu, Min
    Chen, Hao
    Varshney, Pramod K.
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (08) : 3041 - 3049
  • [7] A geometric framework for outlier detection in high-dimensional data
    Herrmann, Moritz
    Pfisterer, Florian
    Scheipl, Fabian
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (03)
  • [8] A Comparison of Outlier Detection Techniques for High-Dimensional Data
    Xu, Xiaodan
    Liu, Huawen
    Li, Li
    Yao, Minghai
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 11 (01) : 652 - 662
  • [9] A sparse grid based method for generative dimensionality reduction of high-dimensional data
    Bohn, Bastian
    Garcke, Jochen
    Griebel, Michael
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2016, 309 : 1 - 17
  • [10] A Comparison of Outlier Detection Techniques for High-Dimensional Data
    Xiaodan Xu
    Huawen Liu
    Li Li
    Minghai Yao
    [J]. International Journal of Computational Intelligence Systems, 2018, 11 : 652 - 662