Fair k-Center Clustering in MapReduce and Streaming Settings

被引:6
|
作者
Bera, Suman K. [1 ]
Das, Syamantak [2 ]
Galhotra, Sainyam [3 ]
Kale, Sagar Sudhir [4 ]
机构
[1] Katana Graph, Austin, TX 78705 USA
[2] IIIT Delhi, Delhi, India
[3] Univ Chicago, Chicago, IL 60637 USA
[4] Univ Vienna, Fac Comp Sci, Vienna, Austria
关键词
fairness; k-center clustering; disparate impact;
D O I
10.1145/3485447.3512188
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Center-based clustering techniques are fundamental to many real-world applications such as data summarization and social network analysis. In this work, we study the problem of fairness aware k-center clustering over large datasets. We are given an input dataset comprising a set of n points, where each point belongs to a specific demographic group characterized by a protected attribute, such as race or gender. The goal is to identify k clusters such that all clusters have considerable representation from all groups and the maximum radius of these clusters is minimized. The majority of the prior techniques do not scale beyond 100K points for k = 50. To address the scalability challenges, we propose an efficient 2-round algorithm for the MapReduce setting that is guaranteed to be a 9-approximation to the optimal solution. Additionally, we develop a 2-pass streaming algorithm that is efficient and has a low memory footprint. These theoretical results are complemented with an empirical evaluation on million-scale datasets, demonstrating that our techniques are effective to identify high-quality fair clusters and efficient as compared to the state-of-the-art.
引用
收藏
页码:1414 / 1422
页数:9
相关论文
共 50 条
  • [21] k-Center Clustering in Distributed Models
    Biabani, Leyla
    Paz, Ami
    STRUCTURAL INFORMATION AND COMMUNICATION COMPLEXITY, SIROCCO 2024, 2024, 14662 : 83 - 100
  • [22] Global Optimization of K-Center Clustering
    Shi, Mingfei
    Hua, Kaixun
    Ren, Jiayang
    Cao, Yankai
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [23] Computing k-center over Streaming Data for Small k
    Ahn, Hee-Kap
    Kim, Hyo-Sil
    Kim, Sang-Sub
    Son, Wanbin
    ALGORITHMS AND COMPUTATION, ISAAC 2012, 2012, 7676 : 54 - 63
  • [24] Connected k-Center and k-Diameter Clustering
    Drexler, Lukas
    Eube, Jan
    Luo, Kelin
    Reineccius, Dorian
    Roeglin, Heiko
    Schmidt, Melanie
    Wargalla, Julian
    ALGORITHMICA, 2024, 86 (11) : 3425 - 3464
  • [25] Approximation algorithms for probabilistic k-center clustering
    Alipour, Sharareh
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 1 - 11
  • [26] k-center Clustering under Perturbation Resilience
    Balcan, Maria-Florina
    Haghtalab, Nika
    White, Colin
    ACM TRANSACTIONS ON ALGORITHMS, 2020, 16 (02)
  • [27] Fully Dynamic Consistent k-Center Clustering
    Lacki, Jakub
    Haeupler, Bernhard
    Grunau, Christoph
    Rozhon, Vaclav
    Jayaram, Rajesh
    PROCEEDINGS OF THE 2024 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2024, : 3463 - 3484
  • [28] Fully Dynamic k-Center Clustering with Outliers
    Chan, T-H Hubert
    Lattanzi, Silvio
    Sozio, Mauro
    Wang, Bo
    COMPUTING AND COMBINATORICS, COCOON 2022, 2022, 13595 : 150 - 161
  • [29] On the complexity of approximation streaming algorithms for the k-center problem
    Abdelguerfi, Mahdi
    Chen, Zhixiang
    Fu, Bin
    FRONTIERS IN ALGORITHMICS, PROCEEDINGS, 2007, 4613 : 160 - +
  • [30] Fair k-Center Problem with Outliers on Massive Data
    Yuan, Fan
    Diao, Luhong
    Du, Donglei
    Liu, Lei
    TSINGHUA SCIENCE AND TECHNOLOGY, 2023, 28 (06): : 1072 - 1084