Efficient estimation of the number of clusters for high-dimension data

被引:0
|
作者
Kasapis, Spiridon [1 ,2 ,6 ]
Zhang, Geng [4 ]
Smereka, Jonathon M. [5 ]
Vlahopoulos, Nickolas [3 ]
机构
[1] NASA, Adv Supercomp Div, Ames Res Ctr, Moffett Field, CA 94035 USA
[2] Univ Michigan, Ann Arbor, MI USA
[3] Univ Michigan, Naval Architecture & Marine Engn, Ann Arbor, MI USA
[4] Michigan Engn Serv, Ann Arbor, MI USA
[5] US Army, DEVCOM Ground Vehicle Syst Ctr, Warren, MI USA
[6] NASA, Adv Supercomp Div, Ames Research Ctr, N258, 258 Allen Rd, Moffett Field, CA 94035 USA
关键词
Clustering; K-means; number of clusters; initializations; unsupervised learning schema; computer vision; variance ratio criterion; MIXTURE;
D O I
10.1177/15485129231214569
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The exponential growth of digital image data has given rise to the need of efficient content management and retrieval tools. Currently, there is a lack of tools for processing the collected unlabeled data in a schematic manner. K-means is one of the most widely used clustering methods and has been applied in a variety of fields, one of them being image sorting. Although a useful tool for image management, the K-means method is heavily influenced by initializations, the most important one being the need to know the number of clusters a priori. A number of different methods have been proposed for identifying the correct number of clusters for K-means, one of them being the variance ratio criterion (VRC). Despite its popularity, the VRC method comes with two very important shortcomings: it only yields good results when the data dimensionality is low and it does not scale well for a high number of clusters, making it very difficult to use in computer vision applications. We propose an extension to the VRC method that works for increased cluster number and high-dimensionality data sets and therefore is fit for image data sets.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Clustering Algorithm in High-Dimension Based on Similarity
    Li Xia
    Wang Jian-min
    [J]. 2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE, ELECTRONICS AND ELECTRICAL ENGINEERING (ISEEE), VOLS 1-3, 2014, : 2029 - +
  • [32] Dimension reduction of high-dimension categorical data with two or multiple responses considering interactions between responses
    Yang, Yuehan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 221
  • [33] Efficient Hierarchical Kriging Modeling Method for High-dimension Multi-delity Problems
    Youwei He
    Jinliang Luo
    [J]. Chinese Journal of Mechanical Engineering., 2024, 37 (06) - 327
  • [34] Significance Testing for Variable Selection in High-Dimension
    Becu, Jean-Michel
    Ambroise, Christophe
    Grandvalet, Yves
    Dalmasso, Cyril
    [J]. 2015 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2015, : 330 - 337
  • [35] Image Retrieval with High-Dimension Triangle Matching
    Zhao, Minda
    Ling, Qiang
    Li, Feng
    Zheng, Quan
    [J]. PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 10801 - 10806
  • [36] Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices
    Nishiyama, Takahiro
    Hyodo, Masashi
    Seo, Takashi
    Pavlenko, Tatjana
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2013, 143 (11) : 1898 - 1911
  • [37] A Novel High-dimension Data Visualization Method Based on Concept Color Spectrum Diagram
    Di, Hongyu
    Tang, Xiaogang
    Wang, Sun'an
    [J]. 2015 IEEE 11TH INTERNATIONAL COLLOQUIUM ON SIGNAL PROCESSING & ITS APPLICATIONS (CSPA 2015), 2015, : 140 - 144
  • [38] Feature Monitored High-Dimension Endecoder Net for End to End Markless Human Pose Estimation
    带特征监控的高维信息编解码端到端无标记人体姿态估计网络
    [J]. 1600, Chinese Institute of Electronics (48): : 1528 - 1537
  • [40] HIGH-DIMENSION CHAOTIC ATTRACTORS OF A NONLINEAR RING CAVITY
    LEBERRE, M
    RESSAYRE, E
    TALLET, A
    GIBBS, HM
    [J]. PHYSICAL REVIEW LETTERS, 1986, 56 (04) : 274 - 277