Non-redundant data clustering

被引:27
|
作者
Gondek, D [1 ]
Hofmann, T [1 ]
机构
[1] Brown Univ, Dept Comp Sci, Providence, RI 02912 USA
关键词
D O I
10.1109/ICDM.2004.10104
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to deal with this problem, we present an extension of the information bottleneck framework, called coordinated conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score subject to constraints. Algorithmically, one can apply an alternating optimization scheme that can be used in conjunction with different types of numeric and non-numeric attributes. We present experimental results for applications in text mining and computer vision.
引用
收藏
页码:75 / 82
页数:8
相关论文
共 50 条
  • [1] Non-redundant data clustering
    David Gondek
    Thomas Hofmann
    [J]. Knowledge and Information Systems, 2007, 12 : 1 - 24
  • [2] Non-redundant data clustering
    Gondek, David
    Hofmann, Thomas
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 12 (01) : 1 - 24
  • [3] Deep Embedded Non-Redundant Clustering
    Miklautz, Lukas
    Mautz, Dominik
    Altinigneli, Muzaffer Can
    Boehm, Christian
    Plant, Claudia
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5174 - 5181
  • [4] Non-redundant multiple clustering by nonnegative matrix factorization
    Yang, Sen
    Zhang, Lijun
    [J]. MACHINE LEARNING, 2017, 106 (05) : 695 - 712
  • [5] Information-Theoretic Non-redundant Subspace Clustering
    Hubig, Nina
    Plant, Claudia
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT I, 2017, 10234 : 198 - 209
  • [6] Non-redundant multiple clustering by nonnegative matrix factorization
    Sen Yang
    Lijun Zhang
    [J]. Machine Learning, 2017, 106 : 695 - 712
  • [7] Detecting outliers in non-redundant diffraction data
    Read, RJ
    [J]. ACTA CRYSTALLOGRAPHICA SECTION D-BIOLOGICAL CRYSTALLOGRAPHY, 1999, 55 : 1759 - 1764
  • [8] Phylogenomic clustering for selecting non-redundant genomes for comparative genomics
    Moreno-Hagelsieb, Gabriel
    Wang, Zilin
    Walsh, Stephanie
    ElSherbiny, Aisha
    [J]. BIOINFORMATICS, 2013, 29 (07) : 947 - 949
  • [9] Non-redundant multi-view clustering via orthogonalization
    Cui, Ying
    Fern, Xiaoli Z.
    Dy, Jennifer G.
    [J]. ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 133 - +
  • [10] Relevant Subspace Clustering: Mining the Most Interesting Non-Redundant Concepts in High Dimensional Data
    Mueller, Emmanuel
    Assent, Ira
    Guennemann, Stephan
    Krieger, Ralph
    Seidl, Thomas
    [J]. 2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 377 - +