Correlation clustering

被引:0
|
作者
Bansal, N [1 ]
Blum, A [1 ]
Chawla, S [1 ]
机构
[1] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u, v) is labeled either + or - depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of - edges between clusters (equivalently, minimizes the number of disagreements: the number of - edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of "agnostic learning" problem. An interesting feature of this clustering formulation is that one does not need to specify the number of clusters k as a separate parameter as in measures such as k-median or min-sum or min-max clustering. Instead, in our formulation, the optimal number of clusters could be any value between l and n, depending on the edge labels. We look at approximation algorithms for both minimizing disagreements and for maximizing agreements. For minimizing disagreements, we give a constant factor approximation. For maximizing agreements we give a PTAS. We also show how to extend some of these results to graphs with edge labels in [-1, +1], and give some results for the case of random noise.
引用
收藏
页码:238 / 247
页数:10
相关论文
共 50 条
  • [21] Hierarchical Clustering via Sketches and Hierarchical Correlation Clustering
    Vainstein, Danny
    Chatziafratis, Vaggos
    Citovsky, Gui
    Rajagopalan, Anand
    Mahdian, Mohammad
    Azar, Yossi
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 559 - +
  • [22] Correlation Clustering Based on Genetic Algorithm for Documents Clustering
    Zhang, Zhenya
    Cheng, Hongmei
    Chen, Wanli
    Zhang, Shuguang
    Fang, Qiansheng
    2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 3193 - +
  • [23] Neighborhood density correlation clustering
    Wang, Zhenggang
    2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 2044 - 2048
  • [24] On Fuzzy Clustering based Correlation
    Sato-Ilic, Mika
    COMPLEX ADAPTIVE SYSTEMS 2012, 2012, 12 : 230 - 235
  • [25] Galaxy clustering correlation length
    Martinez, V.J.
    Portilla, M.
    Jones, B.J.T.
    Paredes, S.
    Astronomy and Astrophysics, 1993, 280 (01):
  • [26] Correlation Clustering in Data Streams
    Ahn, Kook Jin
    Cormode, Graham
    Guha, Sudipto
    McGregor, Andrew
    Wirth, Anthony
    ALGORITHMICA, 2021, 83 (07) : 1980 - 2017
  • [27] Differentially Private Correlation Clustering
    Bun, Mark
    Elias, Marek
    Kulkarni, Janardhan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [28] Correlation Clustering of Bird Sounds
    Stein, David
    Andres, Bjoern
    PATTERN RECOGNITION, DAGM GCPR 2023, 2024, 14264 : 508 - 523
  • [29] Chromatic Correlation Clustering, Revisited
    Xiu, Qing
    Han, Kai
    Tang, Jing
    Cui, Shuang
    Huang, He
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [30] Approximation Algorithm For Correlation Clustering
    Mitra, Pinaki
    Samal, Mamata
    NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 140 - 145