k-Center Clustering with Outliers in the MPC and Streaming Model

被引:2
|
作者
de Berg, Mark [1 ]
Biabani, Leyla [1 ]
Monemizadeh, Morteza [1 ]
机构
[1] TU Eindhoven, Dept Comp Sci, Eindhoven, Netherlands
关键词
k-center problem; outliers; coreset; massively parallel computing; streaming; ALGORITHMS;
D O I
10.1109/IPDPS54959.2023.00090
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Given a point set P subset of X of size n in a metric space (X, dist) of doubling dimension d and two parameters k is an element of N and z is an element of N, the k-center problem with z outliers asks to return a set C* = {c* (1), center dot center dot center dot, c* (k)}subset of X of k centers such that the maximum distance of all but z points of P to their nearest center in C* is minimized. An (epsilon, k, z)-coreset for this problem is a weighted point set P * such that an optimal solution for the k-center problem with z outliers on P * gives a (1 +/- epsilon)-approximation for the k-center problem with z outliers on P. We study the construction of such coresets in the Massively Parallel Computing (MPC) model, and in the insertion-only as well as the fully dynamic streaming model. We obtain the following results, for any given 0 < epsilon <= 1: In all cases, the size of the computed coreset is O(k/epsilon(d) + z). I n the MPC model the data are distributed over m machines. One is the coordinator machine, which will contain the final answer, the others are worker machines. We present a deterministic 2-round algorithm using O(root n) mvachines, where the worker machines have O(root nk/epsilon(d) + root n center dot log(z + 1)) local memory, and the coordinator has O(root nk/epsilon(d)+ root n center dot log(z+1)+z) local memory. The algorithm can handle point sets P that are distributed arbitrarily (possibly adversarially) over the machines. We also present a randomized algorithm that uses only a single round, under the assumption that the input set P is initially distributed randomly over the machines. Then we present a deterministic algorithm that obtains a trade-off between the number of rounds, R, and the storage per machine. In the streaming model we have a single machine with limited storage, and P is revealed in a streaming fashion.. We present the first lower bound for the insertiononly streaming model, where the points arrive one by one and no points are deleted. We show that any deterministic algorithm that maintains an (epsilon, k, z)coreset must use Omega(k/epsilon(d) + z) space. We complement this by a deterministic streaming algorithm using O(k/epsilon(d) + z) space, which is thus optimal. For the fully dynamic data streams, where points can be inserted as well as deleted we give a randomized algorithm for point sets from a d-dimensional discrete Euclidean space [Delta](d), where Delta is an element of N indicates the size of the universe from which the coordinates are taken. Our algorithm uses only O((k/epsilon(d) + z) log4 (k Delta/epsilon delta)) space, and it is the first algorithm for this setting. We also present an Omega((k/epsilon(d)) log Delta + z) lower bound for deterministic fully dynamic streaming algorithms. For the sliding-window model, we show that any deterministic streaming algorithm that guarantees a (1 +epsilon)-approximation for the k-center problem with outliers in R-d must use Omega((kz/epsilon(d)) log sigma) space, where sigma is the ratio of the largest and smallest distance between any two points in the stream. This (negatively) answers a question posed by De Berg, Monemizadeh, and Zhong [1].
引用
收藏
页码:853 / 863
页数:11
相关论文
共 50 条
  • [21] Robust Hierarchical k-Center Clustering
    Lattanzi, Silvio
    Leonardi, Stefano
    Mirrokni, Vahab
    Razenshteyn, Ilya
    PROCEEDINGS OF THE 6TH INNOVATIONS IN THEORETICAL COMPUTER SCIENCE (ITCS'15), 2015, : 211 - 218
  • [22] Constant Factor Approximation for Capacitated k-Center with Outliers
    Cygan, Marek
    Kociumaka, Tomasz
    31ST INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2014), 2014, 25 : 251 - 262
  • [23] k-Center Clustering in Distributed Models
    Biabani, Leyla
    Paz, Ami
    STRUCTURAL INFORMATION AND COMMUNICATION COMPLEXITY, SIROCCO 2024, 2024, 14662 : 83 - 100
  • [24] Approximation algorithms for the individually fair k-center with outliers
    Han, Lu
    Xu, Dachuan
    Xu, Yicheng
    Yang, Ping
    JOURNAL OF GLOBAL OPTIMIZATION, 2023, 87 (2-4) : 603 - 618
  • [25] Global Optimization of K-Center Clustering
    Shi, Mingfei
    Hua, Kaixun
    Ren, Jiayang
    Cao, Yankai
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [26] Fair colorful k-center clustering
    Jia, Xinrui
    Sheth, Kshiteej
    Svensson, Ola
    MATHEMATICAL PROGRAMMING, 2022, 192 (1-2) : 339 - 360
  • [27] Computing k-center over Streaming Data for Small k
    Ahn, Hee-Kap
    Kim, Hyo-Sil
    Kim, Sang-Sub
    Son, Wanbin
    ALGORITHMS AND COMPUTATION, ISAAC 2012, 2012, 7676 : 54 - 63
  • [28] Connected k-Center and k-Diameter Clustering
    Drexler, Lukas
    Eube, Jan
    Luo, Kelin
    Reineccius, Dorian
    Roeglin, Heiko
    Schmidt, Melanie
    Wargalla, Julian
    ALGORITHMICA, 2024, 86 (11) : 3425 - 3464
  • [29] Approximation algorithms for probabilistic k-center clustering
    Alipour, Sharareh
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 1 - 11
  • [30] k-center Clustering under Perturbation Resilience
    Balcan, Maria-Florina
    Haghtalab, Nika
    White, Colin
    ACM TRANSACTIONS ON ALGORITHMS, 2020, 16 (02)