Fair k-Center Clustering in MapReduce and Streaming Settings

被引:6
|
作者
Bera, Suman K. [1 ]
Das, Syamantak [2 ]
Galhotra, Sainyam [3 ]
Kale, Sagar Sudhir [4 ]
机构
[1] Katana Graph, Austin, TX 78705 USA
[2] IIIT Delhi, Delhi, India
[3] Univ Chicago, Chicago, IL 60637 USA
[4] Univ Vienna, Fac Comp Sci, Vienna, Austria
关键词
fairness; k-center clustering; disparate impact;
D O I
10.1145/3485447.3512188
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Center-based clustering techniques are fundamental to many real-world applications such as data summarization and social network analysis. In this work, we study the problem of fairness aware k-center clustering over large datasets. We are given an input dataset comprising a set of n points, where each point belongs to a specific demographic group characterized by a protected attribute, such as race or gender. The goal is to identify k clusters such that all clusters have considerable representation from all groups and the maximum radius of these clusters is minimized. The majority of the prior techniques do not scale beyond 100K points for k = 50. To address the scalability challenges, we propose an efficient 2-round algorithm for the MapReduce setting that is guaranteed to be a 9-approximation to the optimal solution. Additionally, we develop a 2-pass streaming algorithm that is efficient and has a low memory footprint. These theoretical results are complemented with an empirical evaluation on million-scale datasets, demonstrating that our techniques are effective to identify high-quality fair clusters and efficient as compared to the state-of-the-art.
引用
收藏
页码:1414 / 1422
页数:9
相关论文
共 50 条
  • [41] How to Solve Fair k-Center in Massive Data Models
    Chiplunkar, Ashish
    Kale, Sagar
    Ramamoorthy, Sivaramakrishnan Natarajan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [42] Efficient Constrained K-center Clustering with Background Knowledge
    Guo, Longkun
    Jia, Chaoqi
    Liao, Kewen
    Lu, Zhigang
    Xue, Minhui
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 18, 2024, : 20709 - 20717
  • [43] Red-Blue k-Center Clustering with Distance Constraints
    Eskandari, Marzieh
    Khare, Bhavika B.
    Kumar, Nirman
    Bigham, Bahram Sadeghi
    MATHEMATICS, 2023, 11 (03)
  • [44] Fully Dynamic k-Center Clustering in Low Dimensional Metrics
    Goranci, Gramoz
    Henzinger, Monika
    Leniowski, Dariusz
    Schulz, Christian
    Svozil, Alexander
    2021 PROCEEDINGS OF THE SYMPOSIUM ON ALGORITHM ENGINEERING AND EXPERIMENTS, ALENEX, 2021, : 143 - 153
  • [45] New algorithms for fair k-center problem with outliers and capacity constraints
    Wu, Xiaoliang
    Feng, Qilong
    Xu, Jinhui
    Wang, Jianxin
    THEORETICAL COMPUTER SCIENCE, 2024, 999
  • [46] Fast Distributed k-Center Clustering with Outliers on Massive Data
    Malkomes, Gustavo
    Kusner, Matt J.
    Chen, Wenlin
    Weinberger, Kilian Q.
    Moseley, Benjamin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [47] Fully Dynamic k-Center Clustering With Improved Memory Efficiency
    Chan, T-H Hubert
    Guerquin, Arnaud
    Hu, Shuguang
    Sozio, Mauro
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (07) : 3255 - 3266
  • [48] Faster Query Times for Fully Dynamic k-Center Clustering with Outliers
    Biabani, Leyla
    Hennes, Annika
    Monemizadeh, Morteza
    Schmidt, Melanie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] Research of a GA-based Clustering K-Center Choosing Algorithm
    Yang, Wenchuan
    Liu, Jie
    Chen, Ningjun
    ADVANCED BUILDING MATERIALS AND STRUCTURAL ENGINEERING, 2012, 461 : 360 - 364
  • [50] Parameterized Approximation Algorithms and Lower Bounds for k-Center Clustering and Variants
    Bandyapadhyay, Sayan
    Friggstad, Zachary
    Mousavi, Ramin
    ALGORITHMICA, 2024, 86 (08) : 2557 - 2574