k-Center Clustering with Outliers in Sliding Windows

被引:3
|
作者
Pellizzoni, Paolo [1 ]
Pietracaprina, Andrea [1 ]
Pucci, Geppino [1 ]
机构
[1] Univ Padua, Dept Informat Engn, Via Gradenigo 6-B, I-35131 Padua, Italy
关键词
k-center with outliers; effective diameter; big data; data stream model; sliding windows; coreset; doubling dimension; approximation algorithms; ALGORITHMS;
D O I
10.3390/a15020052
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Metric k-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so a more sensible variant seeks for the best solution that disregards a given number z of points of the dataset, which are called outliers. We provide efficient algorithms for this important variant in the streaming model under the sliding window setting, where, at each time step, the dataset to be clustered is the window W of the most recent data items. For general metric spaces, our algorithms achieve O1 approximation and, remarkably, require a working memory linear in k+z and only logarithmic in |W|. For spaces of bounded doubling dimension, the approximation can be made arbitrarily close to 3. For these latter spaces, we show, as a by-product, how to estimate the effective diameter of the window W, which is a measure of the spread of the window points, disregarding a given fraction of noisy distances. We also provide experimental evidence of the practical viability of the improved clustering and diameter estimation algorithms.
引用
收藏
页数:26
相关论文
共 50 条
  • [21] Fair Colorful k-Center Clustering
    Jia, Xinrui
    Sheth, Kshiteej
    Svensson, Ola
    INTEGER PROGRAMMING AND COMBINATORIAL OPTIMIZATION, IPCO 2020, 2020, 12125 : 209 - 222
  • [22] Robust Hierarchical k-Center Clustering
    Lattanzi, Silvio
    Leonardi, Stefano
    Mirrokni, Vahab
    Razenshteyn, Ilya
    PROCEEDINGS OF THE 6TH INNOVATIONS IN THEORETICAL COMPUTER SCIENCE (ITCS'15), 2015, : 211 - 218
  • [23] Constant Factor Approximation for Capacitated k-Center with Outliers
    Cygan, Marek
    Kociumaka, Tomasz
    31ST INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2014), 2014, 25 : 251 - 262
  • [24] k-Center Clustering in Distributed Models
    Biabani, Leyla
    Paz, Ami
    STRUCTURAL INFORMATION AND COMMUNICATION COMPLEXITY, SIROCCO 2024, 2024, 14662 : 83 - 100
  • [25] Approximation algorithms for the individually fair k-center with outliers
    Han, Lu
    Xu, Dachuan
    Xu, Yicheng
    Yang, Ping
    JOURNAL OF GLOBAL OPTIMIZATION, 2023, 87 (2-4) : 603 - 618
  • [26] Global Optimization of K-Center Clustering
    Shi, Mingfei
    Hua, Kaixun
    Ren, Jiayang
    Cao, Yankai
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [27] Fair colorful k-center clustering
    Jia, Xinrui
    Sheth, Kshiteej
    Svensson, Ola
    MATHEMATICAL PROGRAMMING, 2022, 192 (1-2) : 339 - 360
  • [28] Connected k-Center and k-Diameter Clustering
    Drexler, Lukas
    Eube, Jan
    Luo, Kelin
    Reineccius, Dorian
    Roeglin, Heiko
    Schmidt, Melanie
    Wargalla, Julian
    ALGORITHMICA, 2024, 86 (11) : 3425 - 3464
  • [29] Approximation algorithms for probabilistic k-center clustering
    Alipour, Sharareh
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 1 - 11
  • [30] k-center Clustering under Perturbation Resilience
    Balcan, Maria-Florina
    Haghtalab, Nika
    White, Colin
    ACM TRANSACTIONS ON ALGORITHMS, 2020, 16 (02)