Choice of distance matrices in cluster analysis: Defining regions

被引:0
|
作者
Mimmack, GM
Mason, SJ [1 ]
Galpin, JS
机构
[1] Univ Calif San Diego, Scripps Inst Oceanog, Div Climate Res, La Jolla, CA 92093 USA
[2] Univ Witwatersrand, Dept Stat & Actuarial Sci, Johannesburg, South Africa
关键词
D O I
10.1175/1520-0442(2001)014<2790:CODMIC>2.0.CO;2
中图分类号
P4 [大气科学(气象学)];
学科分类号
0706 ; 070601 ;
摘要
Cluster analysis is a technique frequently used in climatology for grouping cases to define classes (synoptic types or climate regimes, for example), or for grouping stations or grid points to define regions. Cluster analysis is based on some form of distance matrix, and the most commonly used metric in the climatological field has been Euclidean distances. Arguments for the use of Euclidean distances are in some ways similar to arguments for using a covariance matrix in principal components analysis: the use of the metric is valid if all data are measured on the same scale. When using Euclidean distances for cluster analysis, however, the additional assumption is made that all the variables are uncorrelated, and this assumption is frequently ignored. Two possible methods of dealing with the correlation between the variables are considered: performing a principal components analysis before calculating Euclidean distances, and calculating Mahalanobis distances using the raw data. Under certain conditions calculating Mahalanobis distances is equivalent to calculating Euclidean distances from the principal components. It is suggested that when cluster analysis is used for defining regions, Mahalanobis distances are inappropriate, and that Euclidean distances should be calculated using the unstandardized principal component scores based on only the major principal components.
引用
收藏
页码:2790 / 2797
页数:8
相关论文
共 50 条
  • [31] Hyperbolic Distance Matrices
    Tabaghi, Puoya
    Dokmanic, Ivan
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1728 - 1738
  • [32] Euclidean Distance Matrices
    Dokmanic, Ivan
    Parhizkar, Reza
    Ranieri, Juri
    Vetterli, Martin
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (06) : 12 - 30
  • [33] THE SINGULARITY OF DISTANCE MATRICES
    LIGHT, WA
    [J]. MULTIVARIATE APPROXIMATION THEORY IV, 1989, 90 : 233 - 240
  • [34] Block distance matrices
    Balaji, R.
    Bapat, R. B.
    [J]. ELECTRONIC JOURNAL OF LINEAR ALGEBRA, 2007, 16 : 435 - 443
  • [35] On Euclidean distance matrices
    Balaji, R.
    Bapat, R. B.
    [J]. LINEAR ALGEBRA AND ITS APPLICATIONS, 2007, 424 (01) : 108 - 117
  • [36] Functionally Related Genes Cluster into Genomic Regions That Coordinate Transcription at a Distance in Saccharomyces cerevisiae
    Cera, Alanna
    Holganza, Maria K.
    Abu Hardan, Ahmad
    Gamarra, Irvin
    Eldabagh, Reem S.
    Deschaine, Megan
    Elkamhawy, Sarah
    Sisso, Exequiel M.
    Foley, Jonathan J.
    Arnone, James T.
    [J]. MSPHERE, 2019, 4 (02):
  • [37] A Criterion Based on the Mahalanobis Distance for Cluster Analysis with Subsampling
    Nicolas Picard
    Avner Bar-Hen
    [J]. Journal of Classification, 2012, 29 : 23 - 49
  • [38] A Criterion Based on the Mahalanobis Distance for Cluster Analysis with Subsampling
    Picard, Nicolas
    Bar-Hen, Avner
    [J]. JOURNAL OF CLASSIFICATION, 2012, 29 (01) : 23 - 49
  • [39] DISTANCE-BASED CLUSTER ANALYSIS AND MEASUREMENT SCALES
    MAJONE, G
    [J]. QUALITY & QUANTITY, 1970, 4 (01) : 153 - 164
  • [40] Analysis of tagged sequences by line distance matrices and grid paths
    Pisanski-Peterlin, Agnes
    Pisanski, Tomaz
    [J]. CROATICA CHEMICA ACTA, 2008, 81 (02) : 253 - 261