Fair k-Center Clustering in MapReduce and Streaming Settings

被引:6
|
作者
Bera, Suman K. [1 ]
Das, Syamantak [2 ]
Galhotra, Sainyam [3 ]
Kale, Sagar Sudhir [4 ]
机构
[1] Katana Graph, Austin, TX 78705 USA
[2] IIIT Delhi, Delhi, India
[3] Univ Chicago, Chicago, IL 60637 USA
[4] Univ Vienna, Fac Comp Sci, Vienna, Austria
关键词
fairness; k-center clustering; disparate impact;
D O I
10.1145/3485447.3512188
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Center-based clustering techniques are fundamental to many real-world applications such as data summarization and social network analysis. In this work, we study the problem of fairness aware k-center clustering over large datasets. We are given an input dataset comprising a set of n points, where each point belongs to a specific demographic group characterized by a protected attribute, such as race or gender. The goal is to identify k clusters such that all clusters have considerable representation from all groups and the maximum radius of these clusters is minimized. The majority of the prior techniques do not scale beyond 100K points for k = 50. To address the scalability challenges, we propose an efficient 2-round algorithm for the MapReduce setting that is guaranteed to be a 9-approximation to the optimal solution. Additionally, we develop a 2-pass streaming algorithm that is efficient and has a low memory footprint. These theoretical results are complemented with an empirical evaluation on million-scale datasets, demonstrating that our techniques are effective to identify high-quality fair clusters and efficient as compared to the state-of-the-art.
引用
收藏
页码:1414 / 1422
页数:9
相关论文
共 50 条
  • [1] Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially
    Ceccarello, Matteo
    Pietracaprina, Andrea
    Pucci, Geppino
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (07): : 766 - 778
  • [2] Fair k-center Clustering with Outliers
    Amagata, Daichi
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [3] Fair colorful k-center clustering
    Xinrui Jia
    Kshiteej Sheth
    Ola Svensson
    Mathematical Programming, 2022, 192 : 339 - 360
  • [4] Fair Colorful k-Center Clustering
    Jia, Xinrui
    Sheth, Kshiteej
    Svensson, Ola
    INTEGER PROGRAMMING AND COMBINATORIAL OPTIMIZATION, IPCO 2020, 2020, 12125 : 209 - 222
  • [5] Fair colorful k-center clustering
    Jia, Xinrui
    Sheth, Kshiteej
    Svensson, Ola
    MATHEMATICAL PROGRAMMING, 2022, 192 (1-2) : 339 - 360
  • [6] Streaming Fair k-Center Clustering over Massive Dataset with Performance Guarantee
    Lin, Zeyu
    Guo, Longkun
    Jia, Chaoqi
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT III, PAKDD 2024, 2024, 14647 : 105 - 117
  • [7] Fair k-Center Clustering for Data Summarization
    Kleindessner, Matthaus
    Awasthi, Pranjal
    Morgenstern, Jamie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [8] k-Center Clustering with Outliers in the MPC and Streaming Model
    de Berg, Mark
    Biabani, Leyla
    Monemizadeh, Morteza
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 853 - 863
  • [9] Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity
    MeCutchen, Richard Matthew
    Khuller, Samir
    APPROXIMATION RANDOMIZATION AND COMBINATORIAL OPTIMIZATION: ALGORITHMS AND TECHNIQUES, PROCEEDINGS, 2008, 5171 : 165 - 178
  • [10] Distributed Fair k-Center Clustering Problems with Outliers
    Yuan, Fan
    Diao, Luhong
    Du, Donglei
    Liu, Lei
    PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 430 - 440