Using the Negentropy Increment to Determine the Number of Clusters

被引:0
|
作者
Lago-Fernandez, Luis F. [1 ]
Corbacho, Fernando [2 ]
机构
[1] Univ Autonoma Madrid, Escuela Politecn Super, Dept Ingn Informat, E-28049 Madrid, Spain
[2] Cognodata Consulting, E-28010 Madrid, Spain
关键词
Crisp clustering; Cluster validation; Negentropy; MIXTURE MODEL; VALIDITY; NEC;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a new validity index for crisp clustering that is based on the average normality of the clusters. A normal cluster is optimal in the sense of maximum uncertainty, or minimum structure, and so performing further partitions on it will not reveal additional substructures. To characterize the normality of a cluster we use the negentropy, a standard measure of distance to normality which evaluates the difference between the cluster's entropy and the entropy of a normal distribution with the same covariance matrix. Although the definition of the negentropy involves the differential entropy, we show that it is possible to avoid its explicit computation by considering only negentropy increments with respect to the initial data distribution. The resulting negentropy increment validity index only requires the computation of determinants of covariance matrices. We have applied the index to randomly generated problems, and show that it provides better results than other indices for the assessment of the number of clusters.
引用
收藏
页码:448 / +
页数:3
相关论文
共 50 条
  • [1] The effect of low number of points in clustering validation via the negentropy increment
    Lago-Fernandez, Luis F.
    Sanchez-Montanes, Manuel
    Corbacho, Fernando
    [J]. NEUROCOMPUTING, 2011, 74 (16) : 2657 - 2664
  • [2] Using the stability of objects to determine the number of clusters in datasets
    Lord, Etienne
    Willems, Matthieu
    Lapointe, Francois-Joseph
    Makarenkov, Vladimir
    [J]. INFORMATION SCIENCES, 2017, 393 : 29 - 46
  • [3] Kernel MDL to determine the number of clusters
    Kyrgyzov, Ivan O.
    Kyrgyzov, Olexiy O.
    Maitre, Henri
    Campedel, Marine
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2007, 4571 : 203 - +
  • [4] Determine the number of clusters by data augmentation
    Luo, Wei
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2022, 16 (02): : 3910 - 3936
  • [5] DETERMINE OPTIMUM NUMBER OF COMPACT OVERLAPPED CLUSTERS USING FRLVQ TECHNIQUE
    Xu Wenhuan Huang Qiang Ji Zhen Zhang Jihong (Faculty of Information Engineering
    [J]. Journal of Electronics(China), 2005, (06) : 110 - 114
  • [6] DETERMINE OPTIMUM NUMBER OF COMPACT OVERLAPPED CLUSTERS USING FRLVQ TECHNIQUE
    Xu Wenhuan Huang Qiang Ji Zhen Zhang Jihong Faculty of Information Engineering Shenzhen University Shenzhen China
    [J]. JournalofElectronics., 2005, (06)
  • [7] An Approach to Determine the Number of Clusters for Clustering Algorithms
    Dinh Thuan Nguyen
    Huan Doan
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE - TECHNOLOGIES AND APPLICATIONS, PT I, 2012, 7653 : 485 - 494
  • [8] An Adaptive Method to Determine the Number of Clusters in Clustering Process
    Huan Doan
    Dinh Thuan Nguyen
    [J]. 2014 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCOINS), 2014,
  • [9] An automatic method to determine the number of clusters using decision-theoretic rough set
    Yu, Hong
    Liu, Zhanguo
    Wang, Guoyin
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2014, 55 (01) : 101 - 115
  • [10] An Evolving Fuzzy Model to Determine an Optimal Number of Data Stream Clusters
    Al-Khamees, Hussein A. A.
    Al-A'araji, Nabeel
    Al-Shamery, Eman S.
    [J]. INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2022, 22 (03) : 267 - 275