Correlation clustering

被引：0

作者：

Bansal, N ^{[1
]}

Blum, A ^{[1
]}

Chawla, S ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA

来源：

FOCS 2002: 43RD ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS | 2002年

关键词：

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u, v) is labeled either + or - depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of - edges between clusters (equivalently, minimizes the number of disagreements: the number of - edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of "agnostic learning" problem. An interesting feature of this clustering formulation is that one does not need to specify the number of clusters k as a separate parameter as in measures such as k-median or min-sum or min-max clustering. Instead, in our formulation, the optimal number of clusters could be any value between l and n, depending on the edge labels. We look at approximation algorithms for both minimizing disagreements and for maximizing agreements. For minimizing disagreements, we give a constant factor approximation. For maximizing agreements we give a PTAS. We also show how to extend some of these results to graphs with edge labels in [-1, +1], and give some results for the case of random noise.

引用

页码：238 / 247

页数：10

共 50 条

[21] Hierarchical Clustering via Sketches and Hierarchical Correlation Clustering
Vainstein, Danny
Chatziafratis, Vaggos
Citovsky, Gui
Rajagopalan, Anand
Mahdian, Mohammad
Azar, Yossi
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 559 - +
[22] Correlation Clustering Based on Genetic Algorithm for Documents Clustering
Zhang, Zhenya
Cheng, Hongmei
Chen, Wanli
Zhang, Shuguang
Fang, Qiansheng
2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 3193 - +
[23] Neighborhood density correlation clustering
Wang, Zhenggang
2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 2044 - 2048
[24] On Fuzzy Clustering based Correlation
Sato-Ilic, Mika
COMPLEX ADAPTIVE SYSTEMS 2012, 2012, 12 : 230 - 235
[25] Galaxy clustering correlation length
Martinez, V.J.
Portilla, M.
Jones, B.J.T.
Paredes, S.
Astronomy and Astrophysics, 1993, 280 (01):
[26] Correlation Clustering in Data Streams
Ahn, Kook Jin
Cormode, Graham
Guha, Sudipto
McGregor, Andrew
Wirth, Anthony
ALGORITHMICA, 2021, 83 (07) : 1980 - 2017
[27] Differentially Private Correlation Clustering
Bun, Mark
Elias, Marek
Kulkarni, Janardhan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[28] Correlation Clustering of Bird Sounds
Stein, David
Andres, Bjoern
PATTERN RECOGNITION, DAGM GCPR 2023, 2024, 14264 : 508 - 523
[29] Chromatic Correlation Clustering, Revisited
Xiu, Qing
Han, Kai
Tang, Jing
Cui, Shuang
Huang, He
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[30] Approximation Algorithm For Correlation Clustering
Mitra, Pinaki
Samal, Mamata
NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 140 - 145

← 1 2 3 4 5 →