Outlier mining algorithm for high dimensional categorical data streams based on spectral clustering

被引:0
|
作者
Kang Y.-L. [1 ]
Feng L.-L. [2 ]
Zhang J.-A. [3 ]
Chen F. [4 ]
机构
[1] School of Computer and Network Engineering, Shanxi Datong University, Datong
[2] School of Education Science and Technology, Shanxi Datong University, Datong
[3] Computer Network Center, Shanxi Datong University, Datong
[4] School of Mathematics and Statistics, Shanxi Datong University, Datong
关键词
Computer application; Data flow; High dimensional category attribute; Outlier mining; Sliding window; Spectral clustering algorithm;
D O I
10.13229/j.cnki.jdxbgxb20210511
中图分类号
学科分类号
摘要
In order to discover abnormal data in the data stream in time and reduce potential threats to the network, a high-dimensional category attribute data stream outlier mining algorithm based on spectral clustering is proposed. The characteristics of orderliness, high speed and high dimensionality of data streams are analyzed, and the main sources of outliers are explored.Using the attribute weight quantization method, introducing information entropy, merging the data streams with strong relevance, and then reducing the dimensionality of the data streams to reduce interference. The spectral clustering algorithm is used to set key scale parameters, the distance between the sample and the target is calculated by the affinity matrix, the spectral clustering is transformed into an undirected graph segmentation problem, the feature matrix is obtained, and the significant outlier features are extracted.Using the distance mining method, data blocks is added to the data stream, the probability distribution between two adjacent data blocks is judged, a sliding window is set, the distance between the data and the sliding window is obtained, and then compare with the set threshold. Outliers are added to the set to complete the mining.The simulation results show that for data streams of different sizes and dimensions, the execution time required by the algorithm is within 42 s and 40 s respectively, and it has good scalability for the size and dimensions of data streams, and the outlier data mined is consistent with the reality. © 2022, Jilin University Press. All right reserved.
引用
收藏
页码:1422 / 1427
页数:5
相关论文
共 10 条
  • [1] Jiang Feng, Wang Kai-li, Yu Xu, Et al., A rough entropy-based approach to outlier detection and its application in unsupervised intrusion detection, Control and Decision, 35, 5, pp. 1199-1204, (2020)
  • [2] Yang Xiao-ling, Feng Shan, Yuan Zhong, Outlier detection based on reversed k-nearest neighborhood MST of relative distance measure, Acta Electronica Sinica, 48, 5, pp. 937-945, (2020)
  • [3] Ye Fu-lan, Clustering algorithm for uncertain data stream based on outlier detection, Journal of China Academy of Electronics and Information Technology, 14, 10, pp. 1094-1099, (2019)
  • [4] Mao Ya-qiong, Tian Li-qin, Wang Yan, Et al., Fast outlier detection algorithm in data stream with local density of vector dot product, Computer Engineering, 46, 11, pp. 132-138, (2020)
  • [5] Xie Juan-ying, Ding Li-juan, Wang Ming-zhao, Spectral clustering based unsupervised feature selection algorithms, Journal of Software, 31, 4, pp. 1009-1024, (2020)
  • [6] Yang Zi-ying, Pu Xiao-long, Xu Jia-hui, High-dimensional fault diagnosis by controlling missed discovery excessive probability, Journal of Applied Statistics and Management, 39, 3, pp. 495-510, (2020)
  • [7] Deng Li, Liu Qing-lian, Wu Qun-yong, Et al., Anomaly detection and type identification based on spatio-temporal characteristics of data streams in wireless sensor network, Chinese Journal of Sensors and Actuators, 32, 9, pp. 1374-1380, (2019)
  • [8] Chen Shao-bo, Multidimensional sparse data flow anomaly data association mining simulation, Computer Simulation, 36, 9, pp. 342-345, (2019)
  • [9] Zhang Yan-mei, Lu Wei, Yang Yu-wang, Novel data mining framework for vibration data stream based on associated frequency patterns, Journal of Data Acquisition and Processing, 34, 5, pp. 872-882, (2019)
  • [10] Cheng Shi-qing, Hao Wen-yu, Li Chen, Et al., Multi-view clustering by low-rank tensor decomposition, Journal of Xi'an Jiaotong University, 54, 3, pp. 119-125, (2020)