Correlation clustering

被引:0
|
作者
Bansal, N [1 ]
Blum, A [1 ]
Chawla, S [1 ]
机构
[1] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u, v) is labeled either + or - depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of - edges between clusters (equivalently, minimizes the number of disagreements: the number of - edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of "agnostic learning" problem. An interesting feature of this clustering formulation is that one does not need to specify the number of clusters k as a separate parameter as in measures such as k-median or min-sum or min-max clustering. Instead, in our formulation, the optimal number of clusters could be any value between l and n, depending on the edge labels. We look at approximation algorithms for both minimizing disagreements and for maximizing agreements. For minimizing disagreements, we give a constant factor approximation. For maximizing agreements we give a PTAS. We also show how to extend some of these results to graphs with edge labels in [-1, +1], and give some results for the case of random noise.
引用
收藏
页码:238 / 247
页数:10
相关论文
共 50 条
  • [31] Combination clustering for Web correlation
    Takahashi, K
    Miura, T
    Shioya, I
    2005 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2005, : 434 - 437
  • [32] Correlation Clustering in Data Streams
    Kook Jin Ahn
    Graham Cormode
    Sudipto Guha
    Andrew McGregor
    Anthony Wirth
    Algorithmica, 2021, 83 : 1980 - 2017
  • [33] Motif and Hypergraph Correlation Clustering
    Li, Pan
    Puleo, Gregory J.
    Milenkovic, Olgica
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2020, 66 (05) : 3065 - 3078
  • [34] Correlation clustering: a parallel approach?
    Aszalos, Laszlo
    Bako, Maria
    PROCEEDINGS OF THE 2017 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2017, : 403 - 406
  • [35] Correlation Clustering in Data Streams
    Ahn, Kook Jin
    Cormode, Graham
    Guha, Sudipto
    McGregor, Andrew
    Wirth, Anthony
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2237 - 2246
  • [36] Correlation clustering with partial information
    Demaine, ED
    Immorlica, N
    APPROXIMATION, RANDOMIZATION, AND COMBINATORIAL OPTIMIZATION, 2003, 2764 : 1 - 13
  • [37] Correlation Clustering with Noisy Input
    Mathieu, Claire
    Schudy, Warren
    PROCEEDINGS OF THE TWENTY-FIRST ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2010, 135 : 712 - 728
  • [38] Robust Online Correlation Clustering
    Lattanzi, Silvio
    Moseley, Benjamin
    Vassilvitskii, Sergei
    Wang, Yuyan
    Zhou, Rudy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [39] Clustering Coefficients for Correlation Networks
    Masuda, Naoki
    Sakaki, Michiko
    Ezaki, Takahiro
    Watanabe, Takamitsu
    FRONTIERS IN NEUROINFORMATICS, 2018, 12
  • [40] Fusion Moves for Correlation Clustering
    Beier, Thorsten
    Hamprecht, Fred A.
    Kappes, Joerg H.
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 3507 - 3516