A Discriminative Framework for Clustering via Similarity Functions

被引:0
|
作者
Balcan, Maria-Florina [1 ]
Blum, Avrim [1 ]
Vempala, Santosh [2 ]
机构
[1] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA USA
基金
美国国家科学基金会;
关键词
Clustering; Similarity Functions; Learning;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Problems of clustering data from pairwise similarity information are ubiquitous in Computer Science. Theoretical treatments typically view the similarity information as ground-truth and then design algorithms to (approximately) optimize various graph-based objective functions. However, in most applications, this similarity information is merely based oil some heuristic; the ground truth is really the unknown correct clustering of the data points and the real goal is to achieve low error on the data. In this work, we develop a theoretical approach to clustering from this perspective. In particular, motivated by recent work in learning theory that asks "what natural properties of a similarity (or kernel) function are sufficient to be able to learn well?" we ask "what natural properties of a similarity function are sufficient to be able to cluster well?" To study this question we develop a theoretical framework that can be viewed as an analog of the PAC learning model for clustering, where the object of study, rather than being a concept class, is a class of (concept, similarity function) pairs, or equivalently, a property the similarity function should satisfy with respect to the ground truth clustering. We then analyze both algorithmic and information theoretic issues in our model. While quite strong properties are needed if the goal is to produce a single approximately-correct clustering, we find that a number of reasonable properties are sufficient under two natural relaxations: (a) list clustering: analogous to the notion of list-decoding, the algorithm can produce a small list of clusterings (which a user can select from) and (b) hierarchical clustering: the algorithm's goal is to produce a hierarchy such that desired clustering is some pruning of this tree (which a user could navigate). We develop a notion of the clustering complexity of a given property (analogous to notions of capacity in learning theory), that characterizes its information-theoretic usefulness for clustering. We analyze this quantity for several natural game-theoretic and learning-theoretic properties, as well as design new efficient algorithms that are able to take advantage of them. Our algorithms for hierarchical clustering combine recent learning-theoretic approaches with linkage-style methods. We also show how our algorithms can be extended to the inductive case, i.e., by using just a constant-sized sample, as in property testing. The analysis here uses regularity-type results of [20] and [3].
引用
收藏
页码:671 / +
页数:2
相关论文
共 50 条
  • [31] Discriminative Subspace Clustering
    Zografos, Vasileios
    Ellis, Liam
    Mester, Rudolf
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 2107 - 2114
  • [32] Regularized discriminative clustering
    Kaski, S
    Sinkkonen, J
    Klami, A
    2003 IEEE XIII WORKSHOP ON NEURAL NETWORKS FOR SIGNAL PROCESSING - NNSP'03, 2003, : 289 - 298
  • [33] Hippocampus Parcellation via Discriminative Embedded Clustering of fMRI Functional Connectivity
    Peng, Limin
    Hou, Chenping
    Su, Jianpo
    Shen, Hui
    Wang, Lubin
    Hu, Dewen
    Zeng, Ling-Li
    BRAIN SCIENCES, 2023, 13 (05)
  • [34] Learning with Similarity Functions: a Tensor-Based Framework
    Ragusa, Edoardo
    Gastaldo, Paolo
    Zunino, Rodolfo
    Cambria, Erik
    COGNITIVE COMPUTATION, 2019, 11 (01) : 31 - 49
  • [35] Learning with Similarity Functions: a Tensor-Based Framework
    Edoardo Ragusa
    Paolo Gastaldo
    Rodolfo Zunino
    Erik Cambria
    Cognitive Computation, 2019, 11 : 31 - 49
  • [36] Image Co-Segmentation via Locally Biased Discriminative Clustering
    Liang, Xianpeng
    Wu, Di
    Huang, De-Shuang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (11) : 2228 - 2233
  • [37] Learning Discriminative Representations for Big Data Clustering using Similarity-based Dimensionality Reduction
    Passalis, Nikolaos
    Tefas, Anastasios
    PROCEEDINGS 2018 IEEE 13TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2018,
  • [38] Bilevel fuzzy clustering via adaptive similarity graphs fusion
    Zhao, Yin-Ping
    Dai, Xiangfeng
    Chen, Yongyong
    Zhang, Chuanbin
    Chen, Long
    Zhao, Yue
    INFORMATION SCIENCES, 2024, 662
  • [39] Network Completion via Joint Node Clustering and Similarity Learning
    Rafailidis, Dimitrios
    Crestani, Fabio
    PROCEEDINGS OF THE 2016 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING ASONAM 2016, 2016, : 63 - 68
  • [40] Predicting user preferences via similarity-based clustering
    Qin, Mian
    Buffett, Scott
    Fleming, Michael W.
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2008, 5032 : 222 - +