Two-phase clustering process for outliers detection

被引:227
|
作者
Jiang, MF [1 ]
Tseng, SS [1 ]
Su, CM [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp & Informat Sci, Hsinchu 30050, Taiwan
关键词
outliers; k-means clustering; two-phase clustering; MST;
D O I
10.1016/S0167-8655(00)00131-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a two-phase clustering algorithm for outliers detection is proposed. Tn;e first modify the traditional k-means algorithm in Phase 1 by using a heuristic "if one new input pattern is far enough away from all clusters centers, then assign it as a new cluster center". It results that the data points in the same cluster may be most likely all outliers or all non-outliers. And then we construct a minimum spanning tree (MST) in Phase 2 and remove the longest edge. The small clusters, the tree with less number of nodes, are selected and regarded as outlier. The experimental results show that our process works well. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:691 / 700
页数:10
相关论文
共 50 条