Efficient estimation of the number of clusters for high-dimension data

被引:0
|
作者
Kasapis, Spiridon [1 ,2 ,6 ]
Zhang, Geng [4 ]
Smereka, Jonathon M. [5 ]
Vlahopoulos, Nickolas [3 ]
机构
[1] NASA, Adv Supercomp Div, Ames Res Ctr, Moffett Field, CA 94035 USA
[2] Univ Michigan, Ann Arbor, MI USA
[3] Univ Michigan, Naval Architecture & Marine Engn, Ann Arbor, MI USA
[4] Michigan Engn Serv, Ann Arbor, MI USA
[5] US Army, DEVCOM Ground Vehicle Syst Ctr, Warren, MI USA
[6] NASA, Adv Supercomp Div, Ames Research Ctr, N258, 258 Allen Rd, Moffett Field, CA 94035 USA
关键词
Clustering; K-means; number of clusters; initializations; unsupervised learning schema; computer vision; variance ratio criterion; MIXTURE;
D O I
10.1177/15485129231214569
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The exponential growth of digital image data has given rise to the need of efficient content management and retrieval tools. Currently, there is a lack of tools for processing the collected unlabeled data in a schematic manner. K-means is one of the most widely used clustering methods and has been applied in a variety of fields, one of them being image sorting. Although a useful tool for image management, the K-means method is heavily influenced by initializations, the most important one being the need to know the number of clusters a priori. A number of different methods have been proposed for identifying the correct number of clusters for K-means, one of them being the variance ratio criterion (VRC). Despite its popularity, the VRC method comes with two very important shortcomings: it only yields good results when the data dimensionality is low and it does not scale well for a high number of clusters, making it very difficult to use in computer vision applications. We propose an extension to the VRC method that works for increased cluster number and high-dimensionality data sets and therefore is fit for image data sets.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] High-dimension data acquisition, computing, and visualization
    Nakano, A
    Chen, JX
    [J]. COMPUTING IN SCIENCE & ENGINEERING, 2003, 5 (05) : 14 - 15
  • [2] Robust Dimensionality Reduction for High-Dimension Data
    Xu, Huan
    Caramanis, Constantine
    Mannor, Shie
    [J]. 2008 46TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, VOLS 1-3, 2008, : 1291 - +
  • [3] Feature selection in the classification of high-dimension data
    Hua, Jianping
    Tembe, Waibhav
    Dougherty, Edward R.
    [J]. 2008 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS, 2008, : 39 - +
  • [4] MANOCCA: a robust and computationally efficient test of covariance in high-dimension multivariate omics data
    Boetto, Christophe
    Frouin, Arthur
    Henches, Leo
    Auvergne, Antoine
    Suzuki, Yuka
    Patin, Etienne
    Bredon, Marius
    Chiu, Alec
    Sankararaman, Sriram
    Zaitlen, Noah
    Kennedy, Sean P.
    Quintana-Murci, Lluis
    Duffy, Darragh
    Sokol, Harry
    Aschard, Hugues
    [J]. BRIEFINGS IN BIOINFORMATICS, 2024, 25 (04)
  • [5] KFCSA: A novel clustering algorithm for high-dimension data
    Li, K
    Liu, YS
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 1, PROCEEDINGS, 2005, 3613 : 531 - 536
  • [6] Target tracking using high-dimension data clustering
    [J]. 1600, Chinese Society of Astronautics (45):
  • [7] Intrinsic Dimensionality Estimation of High-Dimension, Low Sample Size Data with D-Asymptotics
    Yata, Kazuyoshi
    Aoshima, Makoto
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2010, 39 (8-9) : 1511 - 1521
  • [8] Classification for high-dimension low-sample size data
    Shen, Liran
    Er, Meng Joo
    Yin, Qingbo
    [J]. PATTERN RECOGNITION, 2022, 130
  • [9] Personalized PageRank Based Feature Selection for High-dimension Data
    Zhu, Zhibo
    Peng, Qinke
    Guan, Xinyu
    [J]. PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 197 - 202
  • [10] Classification for high-dimension low-sample size data
    Shen, Liran
    Er, Meng Joo
    Yin, Qingbo
    [J]. PATTERN RECOGNITION, 2022, 130