An MDL framework for data clustering

被引:0
|
作者
Kontkanen, P [1 ]
Myllymäki, P [1 ]
Buntine, W [1 ]
Rissanen, J [1 ]
Tirri, H [1 ]
机构
[1] Aalto Univ, CoSCo, FIN-02015 Helsinki, Finland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We regard clustering as a data assignment problem where the goal is to partition the data into several nonhierarchical groups of items. For solving this problem, we suggest an information-theoretic framework based on the minimum description length (MDL) principle. Intuitively, the idea is that we group together those data items that can be compressed well together, so that the total code length over all the data groups is optimized. One can argue that as efficient compression is possible only when one has discovered underlying regularities that are common to all the members of a group, this approach produces an implicitly defined similarity metric between the data items. Formally the global code length criterion to be optimized is defined by using the intuitively appealing universal normalized maximum likelihood code which has been shown to produce an optimal compression rate in an explicitly defined manner. The number of groups can be assumed to be unknown, and the problem of deciding the optimal number is formalized as part of the same theoretical framework. In the empirical part of the paper we present results that demonstrate the validity of the suggested clustering framework.
引用
收藏
页码:323 / 353
页数:31
相关论文
共 50 条
  • [31] A Framework for Clustering Massive Text and Categorical Data Streams
    Aggarwal, Charu C.
    Yu, Philip S.
    PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 479 - 483
  • [32] Facilitating data preprocessing by a generic framework: a proposal for clustering
    Kathrin Kirchner
    Jelena Zec
    Boris Delibašić
    Artificial Intelligence Review, 2016, 45 : 271 - 297
  • [33] A Framework for Clustering and Classification of Big Data Using Spark
    Mallios, Xristos
    Vassalos, Vasilis
    Venetis, Tassos
    Vlachou, Akrivi
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2016 CONFERENCES, 2016, 10033 : 344 - 362
  • [34] Evaluation framework of hierarchical clustering methods for binary data
    Tamasauskas, Darius
    Sakalauskas, Virgilijus
    Kriksciuniene, Dalia
    2012 12TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2012, : 421 - 426
  • [35] A Framework for Clustering Massive-Domain Data Streams
    Aggarwal, Charu C.
    ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 102 - 113
  • [36] Clustering Data Stream Under a Belief Function Framework
    Bahri, Maroua
    Elouedi, Zied
    2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
  • [37] An Evolutionary and Visual Framework for Clustering of DNA Microarray Data
    Castellanos-Garzon, Jose A.
    Diaz, Fernando
    JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2013, 10 (03):
  • [38] A Subspace Clustering Extension for the KNIME Data Mining Framework
    Unnemann, Stephan G.
    Kremer, Hardy
    Musiol, Richard
    Haag, Roman
    Seidl, Thomas
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 886 - 889
  • [39] SCALE: a scalable framework for efficiently clustering transactional data
    Yan, Hua
    Chen, Keke
    Liu, Ling
    Yi, Zhang
    DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 20 (01) : 1 - 27
  • [40] A Framework for Data Clustering of Large Datasets in a Distributed Environment
    Swapna, Ch. Swetha
    Kumar, V. Vijaya
    Murthy, J. V. R.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 1, 2016, 379 : 425 - 441