An MDL framework for data clustering

被引:0
|
作者
Kontkanen, P [1 ]
Myllymäki, P [1 ]
Buntine, W [1 ]
Rissanen, J [1 ]
Tirri, H [1 ]
机构
[1] Aalto Univ, CoSCo, FIN-02015 Helsinki, Finland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We regard clustering as a data assignment problem where the goal is to partition the data into several nonhierarchical groups of items. For solving this problem, we suggest an information-theoretic framework based on the minimum description length (MDL) principle. Intuitively, the idea is that we group together those data items that can be compressed well together, so that the total code length over all the data groups is optimized. One can argue that as efficient compression is possible only when one has discovered underlying regularities that are common to all the members of a group, this approach produces an implicitly defined similarity metric between the data items. Formally the global code length criterion to be optimized is defined by using the intuitively appealing universal normalized maximum likelihood code which has been shown to produce an optimal compression rate in an explicitly defined manner. The number of groups can be assumed to be unknown, and the problem of deciding the optimal number is formalized as part of the same theoretical framework. In the empirical part of the paper we present results that demonstrate the validity of the suggested clustering framework.
引用
收藏
页码:323 / 353
页数:31
相关论文
共 50 条
  • [11] MDL-based time series clustering
    Thanawin Rakthanmanon
    Eamonn J. Keogh
    Stefano Lonardi
    Scott Evans
    Knowledge and Information Systems, 2012, 33 : 371 - 399
  • [12] A Distributed Framework for Online Stream Data Clustering
    Ding, Jiafeng
    Fang, Junhua
    Chao, Pingfu
    Xu, Jiajie
    Zhao, PengPeng
    Zhao, Lei
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT I, 2020, 12452 : 190 - 204
  • [13] Hierarchical division clustering framework for categorical data
    Wei, Wei
    Liang, Jiye
    Guo, Xinyao
    Song, Peng
    Sun, Yijun
    NEUROCOMPUTING, 2019, 341 : 118 - 134
  • [14] A partial order framework for incomplete data clustering
    Hamdi Yahyaoui
    Hosam AboElfotoh
    Yanjun Shu
    Applied Intelligence, 2023, 53 : 7439 - 7454
  • [15] A categorical data clustering framework on graph representation
    Bai, Liang
    Liang, Jiye
    PATTERN RECOGNITION, 2022, 128
  • [16] CDC: A Simple Framework for Complex Data Clustering
    Kang, Zhao
    Xie, Xuanting
    Li, Bingheng
    Pan, Erlin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [17] A Tensor Framework for Data Stream Clustering and Compression
    Cyganek, Boguslaw
    Wozniak, Michal
    IMAGE ANALYSIS AND PROCESSING,(ICIAP 2017), PT I, 2017, 10484 : 163 - 173
  • [18] A partial order framework for incomplete data clustering
    Yahyaoui, Hamdi
    AboElfotoh, Hosam
    Shu, Yanjun
    APPLIED INTELLIGENCE, 2023, 53 (07) : 7439 - 7454
  • [19] A parallel metaheuristic data clustering framework for cloud
    Tsai, Chun-Wei
    Liu, Shi-Jui
    Wang, Yi-Chung
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 116 : 39 - 49
  • [20] An improved Data Clustering algorithm in a Multiobjective Framework
    Thakare, Anuradha D.
    More, M. A.
    2014 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2014,