Making clusterings fairer by post-processing: algorithms, complexity results and experiments

被引:0
|
作者
Davidson, Ian [1 ]
Bai, Zilong [1 ]
Tran, Cindy Mylinh [1 ]
Ravi, S. S. [2 ,3 ]
机构
[1] Univ Calif Davis, Comp Sci Dept, Davis, CA 95616 USA
[2] Univ Virginia, Biocomplex Inst & Initiat, Charlottesville, VA 22904 USA
[3] SUNY Albany, Dept Comp Sci, Albany, NY 12222 USA
关键词
Clustering; Protected status; Fairness; Algorithms; Complexity;
D O I
10.1007/s10618-022-00893-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While existing fairness work typically focuses on fair-by-design algorithms, here we consider making a fairness-unaware algorithm's output fairer. Specifically, we explore the area of fairness in clustering by modifying clusterings produced by existing algorithms to make them fairer whilst retaining their quality. We formulate the minimal cluster modification for fairness (MCMF) problem, where the input is a given partitional clustering and the goal is to minimally change it so that the clustering is still of good quality but fairer. We show that for a single binary protected status variable, the problem is efficiently solvable (i.e., in the class P) by proving that the constraint matrix for an integer linear programming formulation is totally unimodular. Interestingly, we show that even for a single protected variable, the addition of simple pairwise guidance for clustering (to say ensure individual-level fairness) makes the MCMF problem computationally intractable (i.e., NP-hard). Experimental results using Twitter, Census and NYT data sets show that our methods can modify existing clusterings for data sets in excess of 100,000 instances within minutes on laptops and find clusterings that are as fair but are of higher quality than those produced by fair-by-design clustering algorithms. Finally, we explore a challenging practical problem of making a historical clustering (i.e., zipcodes clustered into California's congressional districts) fairer using a new multi-faceted benchmark data set.
引用
收藏
页码:1404 / 1440
页数:37
相关论文
共 50 条
  • [41] New low cost algorithms for key functions in video post-processing subsystem
    He, HY
    ICCE: 2005 INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, DIGEST OF TECHNICAL PAPERS, 2005, : 187 - 188
  • [42] Post-Processing Methods to Enforce Monotonic Constraints in Ant Colony Classification Algorithms
    Brookhouse, James
    Otero, Fernando E. B.
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [43] The impact of irreversible image data compression on post-processing algorithms in computed tomography
    dos Santos, Daniel Pinto
    Friese, Conrad
    Borggrefe, Jan
    Mildenberger, Peter
    Maehringer-Kunz, Aline
    Kloeckner, Roman
    DIAGNOSTIC AND INTERVENTIONAL RADIOLOGY, 2020, 26 (01) : 22 - 27
  • [44] Post-processing of stereoreconstruction results as recovery of hidden Markov field parameters
    Ryabokon', D.I.
    Upravlyayushchie Sistemy i Mashiny, 2003, (04): : 56 - 62
  • [45] Methods of Machine-Readable Zone Recognition Results Post-Processing
    Petrova, Olga
    Bulatov, Konstantin
    ELEVENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2018), 2019, 11041
  • [46] Semantic Analytical Reports: A Framework for Post-processing Data Mining Results
    Kliegr, Tomas
    Ralbovsky, Martin
    Svatek, Vojtech
    Simunek, Milan
    Jirkovsky, Vojtech
    Nemrava, Jan
    Zemanek, Jan
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2009, 5722 : 88 - +
  • [47] Achieving Stable Subspace Clustering by Post-Processing Generic Clustering Results
    Pham, Duc-Son
    Arandjelovic, Ognjen
    Venkatesh, Svetha
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2390 - 2396
  • [48] Hierarchical decision making scheme for sports video categorisation with temporal post-processing
    Jaser, E
    Kittler, J
    Christmas, W
    PROCEEDINGS OF THE 2004 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, 2004, : 908 - 913
  • [49] Applying Design Complexity Metrics for Post-processing cost modeling in Metal Additive Manufacturing
    Clemente, Riccardo C.
    Niknam, Seyed A.
    MANUFACTURING LETTERS, 2024, 41 : 779 - 786
  • [50] Post-processing techniques for making reliable measurements from curve-skeletons
    Bradley, Robert S.
    Withers, Philip J.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2016, 72 : 120 - 131