Clustering noisy data in a reduced dimension space via multivariate regression trees

被引:10
|
作者
Smyth, C [1 ]
Coomans, D [1 ]
Everingham, Y [1 ]
机构
[1] James Cook Univ N Queensland, Sch Math & Phys Sci, Stat & Intelligent Data Anal Grp, Townsville, Qld 4811, Australia
关键词
cluster analysis; noise variables; multivariate regression trees; dimension reduction;
D O I
10.1016/j.patcog.2005.09.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cluster analysis is sensitive to noise variables intrinsically contained within high dimensional data sets. As the size of data sets increases, clustering techniques robust to noise variables must be identified. This investigation gauges the capabilities of recent clustering algorithms applied to two real data sets increasingly perturbed by superfluous noise variables. The recent techniques include mixture models of factor analysers and auto-associative multivariate regression trees. Statistical techniques are integrated to create two approaches useful for clustering noisy data: multivariate regression trees with principal component scores and multivariate regression trees with factor scores. The tree techniques generate the superior clustering results. (c) 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:424 / 431
页数:8
相关论文
共 50 条