Correlation clustering

被引:0
|
作者
Bansal, N [1 ]
Blum, A [1 ]
Chawla, S [1 ]
机构
[1] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u, v) is labeled either + or - depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of - edges between clusters (equivalently, minimizes the number of disagreements: the number of - edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of "agnostic learning" problem. An interesting feature of this clustering formulation is that one does not need to specify the number of clusters k as a separate parameter as in measures such as k-median or min-sum or min-max clustering. Instead, in our formulation, the optimal number of clusters could be any value between l and n, depending on the edge labels. We look at approximation algorithms for both minimizing disagreements and for maximizing agreements. For minimizing disagreements, we give a constant factor approximation. For maximizing agreements we give a PTAS. We also show how to extend some of these results to graphs with edge labels in [-1, +1], and give some results for the case of random noise.
引用
收藏
页码:238 / 247
页数:10
相关论文
共 50 条
  • [41] Online and Consistent Correlation Clustering
    Cohen-Addad, Vincent
    Lattanzi, Silvio
    Maggiori, Andreas
    Parotsidis, Nikos
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [42] MCC - MULTIPLE CORRELATION CLUSTERING
    DOYLE, JR
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1992, 37 (06): : 751 - 765
  • [43] Correlation clustering of graphs and integers
    Akiyama, S.
    Aszalos, L.
    Hajdu, L.
    Petho, A.
    INFOCOMMUNICATIONS JOURNAL, 2014, 6 (04): : 3 - 12
  • [44] PIONIC FUSION AND THE CLUSTERING CORRELATION
    KAJINO, T
    TOKI, H
    KUBO, K
    PHYSICAL REVIEW C, 1987, 35 (04): : 1370 - 1381
  • [45] A note on the inapproximability of correlation clustering
    Tan, Jinsong
    INFORMATION PROCESSING LETTERS, 2008, 108 (05) : 331 - 335
  • [46] Correlation Clustering with Local Objectives
    Kalhan, Sanchit
    Makarychev, Konstantin
    Zhou, Timothy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [47] Correlation Clustering for Learning Mixtures of Canonical Correlation Models
    Fern, Xiaoli Z.
    Brodley, Carla E.
    Friedl, Mark A.
    PROCEEDINGS OF THE FIFTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2005, : 439 - 448
  • [48] Correlation Clustering for Crosslingual Link Detection
    Van Gael, Jurgen
    Zhu, Xiaojin
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1744 - 1749
  • [49] Clustering based ensemble correlation tracking
    Zhu, Guibo
    Wang, Jingiao
    Lu, Hanging
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2016, 153 : 55 - 63
  • [50] Multivariate GARCH Models with Correlation Clustering
    So, Mike K. P.
    Yip, Iris W. H.
    JOURNAL OF FORECASTING, 2012, 31 (05) : 443 - 468