Efficient estimation of the number of clusters for high-dimension data

被引:0
|
作者
Kasapis, Spiridon [1 ,2 ,6 ]
Zhang, Geng [4 ]
Smereka, Jonathon M. [5 ]
Vlahopoulos, Nickolas [3 ]
机构
[1] NASA, Adv Supercomp Div, Ames Res Ctr, Moffett Field, CA 94035 USA
[2] Univ Michigan, Ann Arbor, MI USA
[3] Univ Michigan, Naval Architecture & Marine Engn, Ann Arbor, MI USA
[4] Michigan Engn Serv, Ann Arbor, MI USA
[5] US Army, DEVCOM Ground Vehicle Syst Ctr, Warren, MI USA
[6] NASA, Adv Supercomp Div, Ames Research Ctr, N258, 258 Allen Rd, Moffett Field, CA 94035 USA
关键词
Clustering; K-means; number of clusters; initializations; unsupervised learning schema; computer vision; variance ratio criterion; MIXTURE;
D O I
10.1177/15485129231214569
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The exponential growth of digital image data has given rise to the need of efficient content management and retrieval tools. Currently, there is a lack of tools for processing the collected unlabeled data in a schematic manner. K-means is one of the most widely used clustering methods and has been applied in a variety of fields, one of them being image sorting. Although a useful tool for image management, the K-means method is heavily influenced by initializations, the most important one being the need to know the number of clusters a priori. A number of different methods have been proposed for identifying the correct number of clusters for K-means, one of them being the variance ratio criterion (VRC). Despite its popularity, the VRC method comes with two very important shortcomings: it only yields good results when the data dimensionality is low and it does not scale well for a high number of clusters, making it very difficult to use in computer vision applications. We propose an extension to the VRC method that works for increased cluster number and high-dimensionality data sets and therefore is fit for image data sets.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Image encryption using high-dimension chaotic system
    Sun Fu-Yan
    Liu Shu-Tang
    Lue Zong-Wang
    [J]. CHINESE PHYSICS, 2007, 16 (12): : 3616 - 3623
  • [42] A Secure High-dimension Consensus Mechanism against Adversaries
    Luo, Xiaoyu
    Zhao, Chengcheng
    He, Jianping
    [J]. 2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 789 - 794
  • [43] TECHNOLOGICAL VARIABILITY AND WELFARE IN A HIGH-DIMENSION TRADE MODEL
    SVEDBERG, P
    [J]. OXFORD ECONOMIC PAPERS-NEW SERIES, 1990, 42 (04): : 688 - 694
  • [44] EFFICIENT ESTIMATION FOR DIMENSION REDUCTION WITH CENSORED SURVIVAL DATA
    Zhao, Ge
    Ma, Yanyuan
    Lu, Wenbin
    [J]. STATISTICA SINICA, 2022, 32 : 2359 - 2380
  • [45] LOW-DIMENSION AND HIGH-DIMENSION LIMITS OF A PHASE-SEPARATION MODEL
    PALMER, RG
    FRISCH, HL
    [J]. JOURNAL OF STATISTICAL PHYSICS, 1985, 38 (5-6) : 867 - 872
  • [46] Producibility of brazed high-dimension centrifugal compressor impellers
    Nowacki, J
    Swider, P
    [J]. JOURNAL OF MATERIALS PROCESSING TECHNOLOGY, 2003, 133 (1-2) : 174 - 180
  • [47] Challenges Raised by Mediation Analysis in a High-Dimension Setting
    Blum, Michael G. B.
    Valeri, Linda
    Francois, Olivier
    Cadiou, Solene
    Siroux, Valerie
    Lepeule, Johanna
    Slama, Remy
    [J]. ENVIRONMENTAL HEALTH PERSPECTIVES, 2020, 128 (05)
  • [48] Censored broken adaptive ridge regression in high-dimension
    Lee, Jeongjin
    Choi, Taehwa
    Choi, Sangbum
    [J]. COMPUTATIONAL STATISTICS, 2024, 39 (06) : 3457 - 3482
  • [49] Classification of High-Dimension PDFs Using the Hungarian Algorithm
    Cope, James S.
    Remagnino, Paolo
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2012, 7626 : 727 - 733
  • [50] ELECTRON CORRELATIONS IN THE HIGH-DIMENSION HUBBARD-MODEL
    HASEGAWA, H
    [J]. PROGRESS OF THEORETICAL PHYSICS SUPPLEMENT, 1990, (101): : 463 - 473