Coresets for Clustering with Fairness Constraints

被引:0
|
作者
Huang, Lingxiao [1 ]
Jiang, Shaofeng H. -C. [2 ]
Vishnoi, Nisheeth K. [1 ]
机构
[1] Yale Univ, New Haven, CT 06520 USA
[2] Weizmann Inst Sci, Rehovot, Israel
基金
瑞士国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a recent work, [20] studied the following "fair" variants of classical clustering problems such as k-means and k-median: given a set of n data points in R-d and a binary type associated to each data point, the goal is to cluster the points while ensuring that the proportion of each type in each cluster is roughly the same as its underlying proportion. Subsequent work has focused on either extending this setting to when each data point has multiple, non-disjoint sensitive types such as race and gender [7], or to address the problem that the clustering algorithms in the above work do not scale well [42, 8, 6]. The main contribution of this paper is an approach to clustering with fairness constraints that involve multiple, non-disjoint types, that is also scalable. Our approach is based on novel constructions of coresets: for the k-median objective, we construct an epsilon-coreset of size O(Gamma k(2)epsilon(-d)) where is the number of distinct collections of groups that a point may belong to, and for the k-means objective, we show how to construct an epsilon-coreset of size O(Gamma k(3)epsilon(-d-1)). The former result is the first known coreset construction for the fair clustering problem with the k-median objective, and the latter result removes the dependence on the size of the full dataset as in [42] and generalizes it to multiple, non-disjoint types. Plugging our coresets into existing algorithms for fair clustering such as [6] results in the fastest algorithms for several cases. Empirically, we assess our approach over the Adult, Bank, Diabetes and Athlete dataset, and show that the coreset sizes are much smaller than the full dataset; applying coresets indeed accelerates the running time of computing the fair clustering objective while ensuring that the resulting objective difference is small. We also achieve a speed-up to recent fair clustering algorithms [6, 7] by incorporating our coreset construction.
引用
下载
收藏
页数:12
相关论文
共 50 条
  • [21] Representativity Fairness in Clustering
    Deepak, P.
    Abraham, Savitha Sam
    PROCEEDINGS OF THE 12TH ACM CONFERENCE ON WEB SCIENCE, WEBSCI 2020, 2020, : 202 - 211
  • [22] An Overview of Fairness in Clustering
    Chhabra, Anshuman
    Masalkovaite, Karina
    Mohapatra, Prasant
    IEEE ACCESS, 2021, 9 : 130698 - 130720
  • [23] K-Robots Clustering of Moving Sensors using Coresets
    Feldman, Dan
    Gil, Stephanie
    Knepper, Ross A.
    Julian, Brian
    Rus, Daniela
    2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2013, : 881 - 888
  • [24] Coresets for Clustering in Euclidean Spaces: Importance Sampling Is Nearly Optimal
    Huang, Lingxiao
    Vishnoi, Nisheeth K.
    PROCEEDINGS OF THE 52ND ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '20), 2020, : 1416 - 1429
  • [25] BICO: BIRCH Meets Coresets for k-Means Clustering
    Fichtenberger, Hendrik
    Gille, Marc
    Schmidt, Melanie
    Schwiegelshohn, Chris
    Sohler, Christian
    ALGORITHMS - ESA 2013, 2013, 8125 : 481 - 492
  • [26] Fairness, Semi-Supervised Learning, and More: A General Framework for Clustering with Stochastic Pairwise Constraints
    Brubach, Brian
    Chakrabarti, Darshan
    Dickerson, John P.
    Srinivasan, Aravind
    Tsepenekas, Leonidas
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6822 - 6830
  • [27] Smaller coresets for k-median and k-means clustering
    Har-Peled, Sariel
    Kushal, Akash
    DISCRETE & COMPUTATIONAL GEOMETRY, 2007, 37 (01) : 3 - 19
  • [28] Approximate Group Fairness for Clustering
    Li, Bo
    Li, Lijun
    Sun, Ankang
    Wang, Chenhao
    Wang, Yingfan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [29] Fairness with censorship and group constraints
    Zhang, Wenbin
    Weiss, Jeremy C.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (06) : 2571 - 2594
  • [30] Price Discrimination with Fairness Constraints
    Cohen, Maxime C.
    Elmachtoub, Adam N.
    Lei, Xiao
    MANAGEMENT SCIENCE, 2022, 68 (12) : 8536 - 8552