An MDL framework for data clustering

被引:0
|
作者
Kontkanen, P [1 ]
Myllymäki, P [1 ]
Buntine, W [1 ]
Rissanen, J [1 ]
Tirri, H [1 ]
机构
[1] Aalto Univ, CoSCo, FIN-02015 Helsinki, Finland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We regard clustering as a data assignment problem where the goal is to partition the data into several nonhierarchical groups of items. For solving this problem, we suggest an information-theoretic framework based on the minimum description length (MDL) principle. Intuitively, the idea is that we group together those data items that can be compressed well together, so that the total code length over all the data groups is optimized. One can argue that as efficient compression is possible only when one has discovered underlying regularities that are common to all the members of a group, this approach produces an implicitly defined similarity metric between the data items. Formally the global code length criterion to be optimized is defined by using the intuitively appealing universal normalized maximum likelihood code which has been shown to produce an optimal compression rate in an explicitly defined manner. The number of groups can be assumed to be unknown, and the problem of deciding the optimal number is formalized as part of the same theoretical framework. In the empirical part of the paper we present results that demonstrate the validity of the suggested clustering framework.
引用
收藏
页码:323 / 353
页数:31
相关论文
共 50 条
  • [21] A bi-clustering framework for categorical data
    Pensa, RG
    Robardet, C
    Boulicaut, JF
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2005, 2005, 3721 : 643 - 650
  • [22] A unified framework for privacy preserving data clustering
    Li, Wenye
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8834 : 319 - 326
  • [23] Deep Embedded Clustering Framework for Mixed Data
    Lee, Yonggu
    Park, Chulwung
    Kang, Shinjin
    IEEE ACCESS, 2023, 11 : 33 - 40
  • [24] A Clustering Framework Applied to DNA Microarray Data
    Castellanos-Garzón, J. A. (jantonio_cu@ieee.org), 1600, Springer Verlag (222):
  • [25] A Unified Framework for Privacy Preserving Data Clustering
    Li, Wenye
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT I, 2014, 8834 : 319 - 326
  • [26] Clustering of Mixed Data by Integrating Fuzzy, Probabilistic, and Collaborative Clustering Framework
    Pathak, Arkanath
    Pal, Nikhil R.
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2016, 18 (03) : 339 - 348
  • [27] Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data
    Yu, Zhiwen
    Chen, Hantao
    You, Jane
    Liu, Jiming
    Wong, Hau-San
    Han, Guoqiang
    Li, Le
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (04) : 887 - 901
  • [28] Clustering of Mixed Data by Integrating Fuzzy, Probabilistic, and Collaborative Clustering Framework
    Arkanath Pathak
    Nikhil R. Pal
    International Journal of Fuzzy Systems, 2016, 18 : 339 - 348
  • [29] An MDL Framework for Sparse Coding and Dictionary Learning
    Ramirez, Ignacio
    Sapiro, Guillermo
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2012, 60 (06) : 2913 - 2927
  • [30] Gene expression data clustering and visualization based on a binary hierarchical clustering framework
    Szeto, LK
    Liew, AWC
    Yan, H
    Tang, SS
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2003, 14 (04): : 341 - 362