Outlier detection algorithm based on fluctuation of centroid projection

被引:0
|
作者
Zhang Z. [1 ,2 ,3 ]
Zhang Y. [1 ]
Liu W. [1 ]
Deng Y. [1 ]
机构
[1] College of Information Science and Engineering, Yanshan University, Qinhuangdao
[2] The Key Laboratory for Computer Virtual Technology, System Integration of Hebei Province, Yanshan University, Qinhuangdao
[3] The Key Laboratory of Software Engineering of Hebei Province, Qinhuangdao
关键词
centroid projection fluctuation; data mining; k-nearest neighbors; neighbor tree; outlier detection;
D O I
10.13196/j.cims.2022.12.014
中图分类号
学科分类号
摘要
Outlier detection is an important field of data mining research. In the traditional outlier detection method based on nearest neighbor, the k-nearest neighbor relationship is widely used. However, with the diversification of data distribution and the increase of data dimensions, the process of detecting outliers based on the k-nearest neighbor relationship algorithm is easily affected by different clusters and the detection effect is not satisfactory. To solve the above problems, a new neighborhood set was generated by introducing the nearest neighbor tree instead of the k-nearest neighbor relationship, and the concept of centroid projection was proposed to describe the distribution characteristics of the data object and its neighbors. As the neighbor points of the data object gradually increase, the centroid projections of outliers and internal points were different, and the centroid projection fluctuation was proposed to measure the degree of outlier of each data object. An outlier detection algorithm based on the fluctuation of centroid projection was proposed. Experiments on artificial data sets and real data sets showed that the proposed algorithm could effectively and comprehensively detect outliers. © 2022 CIMS. All rights reserved.
引用
收藏
页码:3869 / 3878
页数:9
相关论文
共 25 条
  • [1] DANIEL R, ADNAN A M, GERHARD H., A survey of anomaly detection in industrial wireless sensor networks with critical water system infrastructure as a case study [J], Sensors, 18, 8, pp. 2491-2491, (2018)
  • [2] AVDIIENKO V, KUZNETSOV K, ROMMELFANGER I, Et al., Detecting behavior anomalies in graphical user interfaces [C], Proceedings of the 39th International Conference on Software Engineering Companion (ICSE-C), (2017)
  • [3] NGAI E W T, HU Y, WONG Y H, Et al., The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature[J], Decision Support System, 50, 3, pp. 559-569, (2011)
  • [4] DENNING D E., An Intrusion-detection model, IEEE Transactions on Software Engineering, 13, 2, pp. 222-232, (1987)
  • [5] ANDRYSIAK T., Sparse representation and overcomplete dictionary learning for anomaly detection in electrocardiograms [J], Neural Computing and Applications, 32, 5, pp. 1269-1285, (2020)
  • [6] Lin MEI, ZHANG Fengli, GAO Qiang, Overview of outlier detection technology[J], Application Research of Computers, 37, 12, pp. 3521-3527, (2020)
  • [7] KNORR E M, NG R T, TUCAKOV V., Distance-based outliers: Algorithms and applications [J], The VLDB Journal, 8, 3, pp. 237-253, (2000)
  • [8] KNORR E M, NG R T., Algorithms for mining distance-based outliers in large datasets
  • [9] BORIAH S, CHANDOLA V, KUMAR V., Similarity measures for categorical data: A comparative evaluation, Proceedings of the SIAM International Conference on Data Mining, (2008)
  • [10] BREUNIG M M, KRIEGEL H P, NG R T, Et al., LOF:Identifying density-based local outliers [J], ACM Sigmod Record, 29, 2, pp. 93-104, (2000)