CANF: Clustering and anomaly detection method using nearest and farthest neighbor

被引:22
|
作者
Faroughi, Azadeh [1 ]
Javidan, Reza [1 ]
机构
[1] Shiraz Univ Technol, Comp Engn & IT Dept, Shiraz, Iran
关键词
Nearest neighbor density estimator; Farthest neighbor; Subgroups; Anomaly detection; Clustering; Principal component analysis (PCA); ALGORITHM;
D O I
10.1016/j.future.2018.06.031
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Nearest-neighbor density estimators usually do not work well for high dimensional datasets. Moreover, they have high time complexity of O(n(2)) and require high memory usage, especially when indexing is used. These problems impose limitations on applying them for small datasets. In order to overcome these limitations, we proposed a new method called CANF which stands for clustering and anomaly detection using nearest and farthest neighbors. This method calculates distances to nearest and farthest neighbor nodes to create dataset subgroups. Therefore, computational time complexity is of O(n log n) and space complexity is constant. In each iteration of subgroup formations, outlier points of subgroups are detected. After subgroup formation, a proposed assembling technique is used to derive correct clusters. CANF uses a new parameter to detect clusters which are not easily separable. Many experiments on synthetic datasets are carried out to demonstrate the feasibility of CANF. Furthermore, on real-world datasets we compared this algorithm to similar algorithms in anomaly detection task and in clustering task namely LOF and DBSCAN, respectively and the results showed significantly higher accuracy of the CANF, especially in high dimensions. Moreover, to overcome high dimensional datasets problems, Principal Component Analysis (PCA) is used in the clustering method, which preprocesses high-dimensional data. The results showed the effectiveness of the proposed method both for clustering as well as anomaly detection applications. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:166 / 177
页数:12
相关论文
共 50 条
  • [1] Characterizations of nearest and farthest neighbor algorithms by clustering admissibility conditions
    Chen, ZM
    Van Ness, J
    [J]. PATTERN RECOGNITION, 1998, 31 (10) : 1573 - 1578
  • [2] Characterizations of nearest and farthest neighbor algorithms by clustering admissibility conditions
    Florida Int Univ, Miami, United States
    [J]. Pattern Recognit, 10 (1573-1578):
  • [3] A New Density Clustering Method Using Mutual Nearest Neighbor
    Zhang, Yufang
    Zha, Yongfang
    Li, Lintao
    Xiong, Zhongyang
    [J]. WEB AND BIG DATA, APWEB-WAIM 2021, PT I, 2021, 12858 : 487 - 494
  • [4] Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection
    Gu, Xiaoyi
    Akoglu, Leman
    Rinaldo, Alessandro
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] An Anomaly Detection Framework Based on Autoencoder and Nearest Neighbor
    Guo, Jia
    Liu, Guannan
    Zuo, Yuan
    Wu, Junjie
    [J]. 2018 15TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT (ICSSSM), 2018,
  • [6] A Review of Anomaly Detection Techniques Based on Nearest Neighbor
    Zhao, Ming
    Chen, Jingchao
    Li, Yang
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON COMPUTER MODELING, SIMULATION AND ALGORITHM (CMSA 2018), 2018, 151 : 290 - 292
  • [7] Isolation-based anomaly detection using nearest-neighbor ensembles
    Bandaragoda, Tharindu R.
    Ting, Kai Ming
    Albrecht, David
    Liu, Fei Tony
    Zhu, Ye
    Wells, Jonathan R.
    [J]. COMPUTATIONAL INTELLIGENCE, 2018, 34 (04) : 968 - 998
  • [8] A new density estimator based on nearest and farthest neighbor
    Faroughi, Azadeh
    Javidan, Reza
    Emami, Mohsen
    [J]. 2016 8TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2016, : 185 - 190
  • [9] Reverse k Nearest Neighbor and Reverse Farthest Neighbor Search on Spatial Networks
    Tran, Quoc Thai
    Taniar, David
    Safar, Maytham
    [J]. TRANSACTIONS ON LARGE-SCALE DATA- AND KNOWLEDGE-CENTERED SYSTEMS I, 2009, 5740 : 353 - +
  • [10] A nearest neighbor method using bisectors
    Kudo, M
    Imai, H
    Tanaka, A
    Murai, T
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 885 - 893