CANF: Clustering and anomaly detection method using nearest and farthest neighbor

被引：22

作者：

Faroughi, Azadeh ^{[1
]}

Javidan, Reza ^{[1
]}

机构：

[1] Shiraz Univ Technol, Comp Engn & IT Dept, Shiraz, Iran

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2018年 / 89卷

关键词：

Nearest neighbor density estimator; Farthest neighbor; Subgroups; Anomaly detection; Clustering; Principal component analysis (PCA); ALGORITHM;

D O I：

10.1016/j.future.2018.06.031

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Nearest-neighbor density estimators usually do not work well for high dimensional datasets. Moreover, they have high time complexity of O(n(2)) and require high memory usage, especially when indexing is used. These problems impose limitations on applying them for small datasets. In order to overcome these limitations, we proposed a new method called CANF which stands for clustering and anomaly detection using nearest and farthest neighbors. This method calculates distances to nearest and farthest neighbor nodes to create dataset subgroups. Therefore, computational time complexity is of O(n log n) and space complexity is constant. In each iteration of subgroup formations, outlier points of subgroups are detected. After subgroup formation, a proposed assembling technique is used to derive correct clusters. CANF uses a new parameter to detect clusters which are not easily separable. Many experiments on synthetic datasets are carried out to demonstrate the feasibility of CANF. Furthermore, on real-world datasets we compared this algorithm to similar algorithms in anomaly detection task and in clustering task namely LOF and DBSCAN, respectively and the results showed significantly higher accuracy of the CANF, especially in high dimensions. Moreover, to overcome high dimensional datasets problems, Principal Component Analysis (PCA) is used in the clustering method, which preprocesses high-dimensional data. The results showed the effectiveness of the proposed method both for clustering as well as anomaly detection applications. (C) 2018 Elsevier B.V. All rights reserved.

引用

页码：166 / 177

页数：12

共 50 条

[1] Characterizations of nearest and farthest neighbor algorithms by clustering admissibility conditions
Chen, ZM
Van Ness, J
[J]. PATTERN RECOGNITION, 1998, 31 (10) : 1573 - 1578
[2] Characterizations of nearest and farthest neighbor algorithms by clustering admissibility conditions
Florida Int Univ, Miami, United States
[J]. Pattern Recognit, 10 (1573-1578):
[3] A New Density Clustering Method Using Mutual Nearest Neighbor
Zhang, Yufang
Zha, Yongfang
Li, Lintao
Xiong, Zhongyang
[J]. WEB AND BIG DATA, APWEB-WAIM 2021, PT I, 2021, 12858 : 487 - 494
[4] Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection
Gu, Xiaoyi
Akoglu, Leman
Rinaldo, Alessandro
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[5] An Anomaly Detection Framework Based on Autoencoder and Nearest Neighbor
Guo, Jia
Liu, Guannan
Zuo, Yuan
Wu, Junjie
[J]. 2018 15TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT (ICSSSM), 2018,
[6] A Review of Anomaly Detection Techniques Based on Nearest Neighbor
Zhao, Ming
Chen, Jingchao
Li, Yang
[J]. PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON COMPUTER MODELING, SIMULATION AND ALGORITHM (CMSA 2018), 2018, 151 : 290 - 292
[7] Isolation-based anomaly detection using nearest-neighbor ensembles
Bandaragoda, Tharindu R.
Ting, Kai Ming
Albrecht, David
Liu, Fei Tony
Zhu, Ye
Wells, Jonathan R.
[J]. COMPUTATIONAL INTELLIGENCE, 2018, 34 (04) : 968 - 998
[8] A new density estimator based on nearest and farthest neighbor
Faroughi, Azadeh
Javidan, Reza
Emami, Mohsen
[J]. 2016 8TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2016, : 185 - 190
[9] Reverse k Nearest Neighbor and Reverse Farthest Neighbor Search on Spatial Networks
Tran, Quoc Thai
Taniar, David
Safar, Maytham
[J]. TRANSACTIONS ON LARGE-SCALE DATA- AND KNOWLEDGE-CENTERED SYSTEMS I, 2009, 5740 : 353 - +
[10] A nearest neighbor method using bisectors
Kudo, M
Imai, H
Tanaka, A
Murai, T
[J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 885 - 893

← 1 2 3 4 5 →