Correlation clustering

被引:0
|
作者
Bansal, N [1 ]
Blum, A [1 ]
Chawla, S [1 ]
机构
[1] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u, v) is labeled either + or - depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of - edges between clusters (equivalently, minimizes the number of disagreements: the number of - edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of "agnostic learning" problem. An interesting feature of this clustering formulation is that one does not need to specify the number of clusters k as a separate parameter as in measures such as k-median or min-sum or min-max clustering. Instead, in our formulation, the optimal number of clusters could be any value between l and n, depending on the edge labels. We look at approximation algorithms for both minimizing disagreements and for maximizing agreements. For minimizing disagreements, we give a constant factor approximation. For maximizing agreements we give a PTAS. We also show how to extend some of these results to graphs with edge labels in [-1, +1], and give some results for the case of random noise.
引用
收藏
页码:238 / 247
页数:10
相关论文
共 50 条
  • [1] Correlation clustering and consensus clustering
    Bonizzoni, P
    Della Vedova, G
    Dondi, R
    Jiang, T
    ALGORITHMS AND COMPUTATION, 2005, 3827 : 226 - 235
  • [2] Correlation clustering
    Bansal, N
    Blum, A
    Chawla, S
    MACHINE LEARNING, 2004, 56 (1-3) : 89 - 113
  • [3] Correlation Clustering
    Nikhil Bansal
    Avrim Blum
    Shuchi Chawla
    Machine Learning, 2004, 56 : 89 - 113
  • [4] On the approximation of correlation clustering and consensus clustering
    Bonizzoni, Paola
    Della Vedova, Gianluca
    Dondi, Riccardo
    Jiang, Tao
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2008, 74 (05) : 671 - 696
  • [5] Rough Clustering Generated by Correlation Clustering
    Aszalos, Laszlo
    Mihalydeak, Tamas
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, 2013, 8170 : 315 - 324
  • [6] LUCKe - Connecting Clustering and Correlation Clustering
    Beer, Anna
    Stephan, Lisa
    Seidl, Thomas
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 431 - 440
  • [7] ONLINE CORRELATION CLUSTERING
    Mathieu, Claire
    Sankur, Ocan
    Schudy, Warren
    27TH INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2010), 2010, 5 : 573 - 583
  • [8] Interactive Correlation Clustering
    Geerts, Floris
    Ndindi, Reuben
    2014 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2014, : 170 - 176
  • [9] Overlapping correlation clustering
    Francesco Bonchi
    Aristides Gionis
    Antti Ukkonen
    Knowledge and Information Systems, 2013, 35 : 1 - 32
  • [10] Overlapping correlation clustering
    Bonchi, Francesco
    Gionis, Aristides
    Ukkonen, Antti
    KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 35 (01) : 1 - 32