k-Center Clustering with Outliers in Sliding Windows

被引:3
|
作者
Pellizzoni, Paolo [1 ]
Pietracaprina, Andrea [1 ]
Pucci, Geppino [1 ]
机构
[1] Univ Padua, Dept Informat Engn, Via Gradenigo 6-B, I-35131 Padua, Italy
关键词
k-center with outliers; effective diameter; big data; data stream model; sliding windows; coreset; doubling dimension; approximation algorithms; ALGORITHMS;
D O I
10.3390/a15020052
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Metric k-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so a more sensible variant seeks for the best solution that disregards a given number z of points of the dataset, which are called outliers. We provide efficient algorithms for this important variant in the streaming model under the sliding window setting, where, at each time step, the dataset to be clustered is the window W of the most recent data items. For general metric spaces, our algorithms achieve O1 approximation and, remarkably, require a working memory linear in k+z and only logarithmic in |W|. For spaces of bounded doubling dimension, the approximation can be made arbitrarily close to 3. For these latter spaces, we show, as a by-product, how to estimate the effective diameter of the window W, which is a measure of the spread of the window points, disregarding a given fraction of noisy distances. We also provide experimental evidence of the practical viability of the improved clustering and diameter estimation algorithms.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Fair k-center Clustering with Outliers
    Amagata, Daichi
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [2] Fully Dynamic k-Center Clustering with Outliers
    Chan, T-H Hubert
    Lattanzi, Silvio
    Sozio, Mauro
    Wang, Bo
    COMPUTING AND COMBINATORICS, COCOON 2022, 2022, 13595 : 150 - 161
  • [3] Fully Dynamic k-Center Clustering with Outliers
    Chan, T. -H. Hubert
    Lattanzi, Silvio
    Sozio, Mauro
    Wang, Bo
    ALGORITHMICA, 2024, 86 (01) : 171 - 193
  • [4] Fully Dynamic k-Center Clustering with Outliers
    T.-H. Hubert Chan
    Silvio Lattanzi
    Mauro Sozio
    Bo Wang
    Algorithmica, 2024, 86 : 171 - 193
  • [5] Distributed Fair k-Center Clustering Problems with Outliers
    Yuan, Fan
    Diao, Luhong
    Du, Donglei
    Liu, Lei
    PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 430 - 440
  • [6] k-Center Clustering with Outliers in the MPC and Streaming Model
    de Berg, Mark
    Biabani, Leyla
    Monemizadeh, Morteza
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 853 - 863
  • [7] Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity
    MeCutchen, Richard Matthew
    Khuller, Samir
    APPROXIMATION RANDOMIZATION AND COMBINATORIAL OPTIMIZATION: ALGORITHMS AND TECHNIQUES, PROCEEDINGS, 2008, 5171 : 165 - 178
  • [8] Adaptive k-center and diameter estimation in sliding windows
    Pellizzoni, Paolo
    Pietracaprina, Andrea
    Pucci, Geppino
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2022, 14 (02) : 155 - 173
  • [9] Dimensionality-adaptive k-center in sliding windows
    Pellizzoni, Paolo
    Pietracaprina, Andrea
    Pucci, Geppino
    2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020), 2020, : 197 - 206
  • [10] Adaptive k-center and diameter estimation in sliding windows
    Paolo Pellizzoni
    Andrea Pietracaprina
    Geppino Pucci
    International Journal of Data Science and Analytics, 2022, 14 : 155 - 173