Clustering noisy data in a reduced dimension space via multivariate regression trees

被引:10
|
作者
Smyth, C [1 ]
Coomans, D [1 ]
Everingham, Y [1 ]
机构
[1] James Cook Univ N Queensland, Sch Math & Phys Sci, Stat & Intelligent Data Anal Grp, Townsville, Qld 4811, Australia
关键词
cluster analysis; noise variables; multivariate regression trees; dimension reduction;
D O I
10.1016/j.patcog.2005.09.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cluster analysis is sensitive to noise variables intrinsically contained within high dimensional data sets. As the size of data sets increases, clustering techniques robust to noise variables must be identified. This investigation gauges the capabilities of recent clustering algorithms applied to two real data sets increasingly perturbed by superfluous noise variables. The recent techniques include mixture models of factor analysers and auto-associative multivariate regression trees. Statistical techniques are integrated to create two approaches useful for clustering noisy data: multivariate regression trees with principal component scores and multivariate regression trees with factor scores. The tree techniques generate the superior clustering results. (c) 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:424 / 431
页数:8
相关论文
共 50 条
  • [31] Quantile regression for longitudinal data via the multivariate generalized hyperbolic distribution
    Florez, Alvaro J.
    Keilegom, Ingrid Van
    Molenberghs, Geert
    Verhasselt, Anneleen
    STATISTICAL MODELLING, 2022, 22 (06) : 566 - 584
  • [32] A minimum discrepancy approach to multivariate dimension reduction via k-means inverse regression
    Wen, Xuerong Meggie
    Setodji, C. Messan
    Adekpedjou, Akim
    STATISTICS AND ITS INTERFACE, 2009, 2 (04) : 503 - 511
  • [33] A Unified Scheme for Distance Metric Learning and Clustering via Rank-Reduced Regression
    Guo, Wenzhong
    Shi, Yiqing
    Wang, Shiping
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (08): : 5218 - 5229
  • [34] A Unified Scheme for Distance Metric Learning and Clustering via Rank-Reduced Regression
    Guo, Wenzhong
    Shi, Yiqing
    Wang, Shiping
    Wang, Shiping (shipingwangphd@163.com); Wang, Shiping (shipingwangphd@163.com), 1600, Institute of Electrical and Electronics Engineers Inc. (51): : 5218 - 5229
  • [35] Natural Pose Generation from a Reduced Dimension Motion Capture Data Space
    Ferrydiansyah, Reza
    Owen, Charles B.
    ADVANCES IN VISUAL COMPUTING, PT 1, PROCEEDINGS, 2009, 5875 : 521 - 530
  • [36] Clustering using Skewed Data via Finite Mixtures of Multivariate Lognormal Distributions
    Deepana, R.
    Kiruthika, C.
    STATISTICS AND APPLICATIONS, 2022, 20 (02): : 219 - 237
  • [37] Clustering multivariate count data via Dirichlet-multinomial network fusion
    Zhao, Xin
    Zhang, Jingru
    Lin, Wei
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2023, 179
  • [38] Multivariate Wind Turbine Power Curve Model Based on Data Clustering and Polynomial LASSO Regression
    Astolfi, Davide
    Pandit, Ravi
    APPLIED SCIENCES-BASEL, 2022, 12 (01):
  • [39] INVESTIGATING SWIMMING TECHNICAL SKILLS BY A DOUBLE PARTITION CLUSTERING OF MULTIVARIATE FUNCTIONAL DATA ALLOWING FOR DIMENSION SELECTION
    Bouvet, Antoine
    El Kolei, Salima
    Marbac, Matthieu
    ANNALS OF APPLIED STATISTICS, 2024, 18 (02): : 1750 - 1772
  • [40] Optimal search space for clustering gene expression data via consensus
    Hirsch, Michael
    Swift, Stephen
    Liu, Xiohui
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2007, 14 (10) : 1327 - 1341