Correlation Clustering with a Fixed Number of Clusters

被引:59
|
作者
Giotis, Ioannis [1 ]
Guruswami, Venkatesan [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
D O I
10.1145/1109557.1109686
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We continue the investigation of problems concerning correlation clustering or clustering with qualitative information, which is a clustering formulation that has been studied recently [5, 7, 8, 3]. The basic setup here is that we are given as input a complete graph on n nodes (which correspond to nodes to be clustered) whose edges are labeled + (for similar pairs of items) and (for dissimilar pairs of items). Thus we have only as input qualitative information on similarity and no quantitative distance measure between items. The quality of a clustering is measured in terms of its number of agreements, which is simply the number of edges it correctly classifies, that is the sum of number of edges whose endpoints it places in different clusters plus the number of + edges both of whose endpoints it places within the same cluster. In this paper, we study the problem of finding clusterings that maximize the number of agreements, and the complementary minimization version where we seek clusterings that minimize the number of disagreements. We focus on the situation when the number of clusters is stipulated to be a small constant k. Our main result is that for every k, there is a polynomial time approximation scheme for both maximizing agreements and minimizing disagreements. (The problems are NPhard for every k >= 2.) The main technical work is for the minimization version, as the PTAS for maximizing agreements follows along the lines of the property tester for Max k-CUT from [13]. In contrast, when the number of clusters is not specified, the problem of minimizing disagreements was shown to be APX-hard [7], even though the maximization version admits a PTAS.
引用
收藏
页码:1167 / 1176
页数:10
相关论文
共 50 条
  • [1] A randomized PTAS for the minimum Consensus Clustering with a fixed number of clusters
    Bonizzoni, Paola
    Della Vedova, Gianluca
    Dondi, Riccardo
    [J]. THEORETICAL COMPUTER SCIENCE, 2012, 429 : 36 - 45
  • [2] Fuzzy Clustering: Determining the Number of Clusters
    Rezankova, Hana
    Husek, Dusan
    [J]. 2012 FOURTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL ASPECTS OF SOCIAL NETWORKS (CASON), 2012, : 277 - 282
  • [3] On the optimal number of clusters in histogram clustering
    Buhmann, JM
    Held, M
    [J]. CLASSIFICATION, AUTOMATION, AND NEW MEDIA, 2002, : 37 - 45
  • [4] Fuzzy Clustering Ensemble with Selection of Number of Clusters
    Li, Taoying
    Chen, Yan
    [J]. JOURNAL OF COMPUTERS, 2010, 5 (07) : 1112 - 1119
  • [5] An Approach to Determine the Number of Clusters for Clustering Algorithms
    Dinh Thuan Nguyen
    Huan Doan
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE - TECHNOLOGIES AND APPLICATIONS, PT I, 2012, 7653 : 485 - 494
  • [6] Automatic identification of the number of clusters in hierarchical clustering
    Karna, Ashutosh
    Gibert, Karina
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (01): : 119 - 134
  • [7] DeepDPM: Deep Clustering With an Unknown Number of Clusters
    Ronen, Meitar
    Finder, Shahaf E.
    Freifeld, Oren
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9851 - 9860
  • [8] Automatic identification of the number of clusters in hierarchical clustering
    Ashutosh Karna
    Karina Gibert
    [J]. Neural Computing and Applications, 2022, 34 : 119 - 134
  • [9] Video Face Clustering with Unknown Number of Clusters
    Tapaswi, Makarand
    Law, Marc T.
    Fidler, Sanja
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5026 - 5035
  • [10] Adaptive optimization of the number of clusters in fuzzy clustering
    Beringer, Juergen
    Huellermeier, Eyke
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-4, 2007, : 657 - +