Clustering noisy data in a reduced dimension space via multivariate regression trees

被引:10
|
作者
Smyth, C [1 ]
Coomans, D [1 ]
Everingham, Y [1 ]
机构
[1] James Cook Univ N Queensland, Sch Math & Phys Sci, Stat & Intelligent Data Anal Grp, Townsville, Qld 4811, Australia
关键词
cluster analysis; noise variables; multivariate regression trees; dimension reduction;
D O I
10.1016/j.patcog.2005.09.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cluster analysis is sensitive to noise variables intrinsically contained within high dimensional data sets. As the size of data sets increases, clustering techniques robust to noise variables must be identified. This investigation gauges the capabilities of recent clustering algorithms applied to two real data sets increasingly perturbed by superfluous noise variables. The recent techniques include mixture models of factor analysers and auto-associative multivariate regression trees. Statistical techniques are integrated to create two approaches useful for clustering noisy data: multivariate regression trees with principal component scores and multivariate regression trees with factor scores. The tree techniques generate the superior clustering results. (c) 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:424 / 431
页数:8
相关论文
共 50 条
  • [41] Hierarchical Reduced-Space Drift Detection Framework for Multivariate Supervised Data Streams
    Zhang, Shuyi
    Tino, Peter
    Yao, Xin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (03) : 2628 - 2640
  • [42] Incorporating external data into the analysis of clinical trials via Bayesian additive regression trees
    Zhou, Tianjian
    Ji, Yuan
    STATISTICS IN MEDICINE, 2021, 40 (28) : 6421 - 6442
  • [43] A REDUCED-SPACE APPROACH TO THE CLUSTERING OF CATEGORICAL-DATA IN MARKET-SEGMENTATION
    GREEN, PE
    SCHAFFER, CM
    PATTERSON, KM
    JOURNAL OF THE MARKET RESEARCH SOCIETY, 1988, 30 (03): : 267 - 288
  • [44] Materials discovery via topologically-correct display of reduced-dimension data
    Pao, YH
    Meng, Z
    LeClair, S
    Igelnik, B
    JOURNAL OF ALLOYS AND COMPOUNDS, 1998, 279 (01) : 22 - 29
  • [45] PCA-based high-dimensional noisy data clustering via control of decision errors
    Lee, Jeonghwa
    Jun, Chi-Hyuck
    KNOWLEDGE-BASED SYSTEMS, 2013, 37 : 338 - 345
  • [46] Spatial Clustering Regression of Count Value Data via Bayesian Mixture of Finite Mixtures
    Zhao, Peng
    Yang, Hou-Cheng
    Dey, Dipak K.
    Hu, Guanyu
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 3504 - 3512
  • [47] Atmospheric and surface parameter retrievals from multispectral thermal imagery via reduced-rank multivariate regression
    Hernandez-Baquero, ED
    Schott, JR
    IGARSS 2000: IEEE 2000 INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOL I - VI, PROCEEDINGS, 2000, : 1525 - 1527
  • [48] Application of a local attractor dimension to reduced space strongly coupled data assimilation for chaotic multiscale systems
    Quinn, Courtney
    O'Kane, Terence J.
    Kitsios, Vassili
    NONLINEAR PROCESSES IN GEOPHYSICS, 2020, 27 (01) : 51 - 74
  • [49] Approximating Numeric Role Fillers via Predictive Clustering Trees for Knowledge Base Enrichment in the Web of Data
    Rizzo, Giuseppe
    d'Amato, Claudia
    Fanizzi, Nicola
    Esposito, Floriana
    DISCOVERY SCIENCE, (DS 2016), 2016, 9956 : 101 - 117
  • [50] Robust learning from noisy, incomplete, high-dimensional experimental data via physically constrained symbolic regression
    Patrick A. K. Reinbold
    Logan M. Kageorge
    Michael F. Schatz
    Roman O. Grigoriev
    Nature Communications, 12