An MDL framework for data clustering

被引:0
|
作者
Kontkanen, P [1 ]
Myllymäki, P [1 ]
Buntine, W [1 ]
Rissanen, J [1 ]
Tirri, H [1 ]
机构
[1] Aalto Univ, CoSCo, FIN-02015 Helsinki, Finland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We regard clustering as a data assignment problem where the goal is to partition the data into several nonhierarchical groups of items. For solving this problem, we suggest an information-theoretic framework based on the minimum description length (MDL) principle. Intuitively, the idea is that we group together those data items that can be compressed well together, so that the total code length over all the data groups is optimized. One can argue that as efficient compression is possible only when one has discovered underlying regularities that are common to all the members of a group, this approach produces an implicitly defined similarity metric between the data items. Formally the global code length criterion to be optimized is defined by using the intuitively appealing universal normalized maximum likelihood code which has been shown to produce an optimal compression rate in an explicitly defined manner. The number of groups can be assumed to be unknown, and the problem of deciding the optimal number is formalized as part of the same theoretical framework. In the empirical part of the paper we present results that demonstrate the validity of the suggested clustering framework.
引用
收藏
页码:323 / 353
页数:31
相关论文
共 50 条
  • [41] SCALE: a scalable framework for efficiently clustering transactional data
    Hua Yan
    Keke Chen
    Ling Liu
    Zhang Yi
    Data Mining and Knowledge Discovery, 2010, 20 : 1 - 27
  • [42] A Framework for Data Clustering of Large Datasets in a Distributed Environment
    Swapna, Ch. Swetha
    Kumar, V. Vijaya
    Murthy, J. V. R.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 1, 2016, 379 : 425 - 441
  • [43] A Framework for Clustering Categorical Time-Evolving Data
    Cao, Fuyuan
    Liang, Jiye
    Bai, Liang
    Zhao, Xingwang
    Dang, Chuangyin
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (05) : 872 - 882
  • [44] A Clustering-based Framework for Classifying Data Streams
    Yan, Xuyang
    Homaifar, Abdollah
    Sarkar, Mrinmoy
    Girma, Abenezer
    Tunstel, Edward
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3257 - 3263
  • [45] A Feature Selection Framework Based on Supervised Data Clustering
    Liu, Hongzhi
    Fu, Bin
    Jiang, Zhengshen
    Wu, Zhonghai
    Hsu, D. Frank
    2016 IEEE 15TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2016, : 316 - 321
  • [46] Facilitating data preprocessing by a generic framework: a proposal for clustering
    Kirchner, Kathrin
    Zec, Jelena
    Delibasic, Boris
    ARTIFICIAL INTELLIGENCE REVIEW, 2016, 45 (03) : 271 - 297
  • [47] MDL for Causal Inference on Discrete Data
    Budhathoki, Kailash
    Vreeken, Jilles
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 751 - 756
  • [48] Fuzzy c-means in an MDL-framework
    Selb, A
    Bischof, H
    Leonardis, A
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 740 - 743
  • [49] A Unified Framework for Clustering Constrained Data Without Locality Property
    Hu Ding
    Jinhui Xu
    Algorithmica, 2020, 82 : 808 - 852
  • [50] An efficient and generic hybrid framework for high dimensional data clustering
    Rajput, Dharmveer Singh
    Singh, P.K.
    Bhattacharya, Mahua
    World Academy of Science, Engineering and Technology, 2010, 40 : 174 - 179