Making clusterings fairer by post-processing: algorithms, complexity results and experiments

被引:0
|
作者
Davidson, Ian [1 ]
Bai, Zilong [1 ]
Tran, Cindy Mylinh [1 ]
Ravi, S. S. [2 ,3 ]
机构
[1] Univ Calif Davis, Comp Sci Dept, Davis, CA 95616 USA
[2] Univ Virginia, Biocomplex Inst & Initiat, Charlottesville, VA 22904 USA
[3] SUNY Albany, Dept Comp Sci, Albany, NY 12222 USA
关键词
Clustering; Protected status; Fairness; Algorithms; Complexity;
D O I
10.1007/s10618-022-00893-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While existing fairness work typically focuses on fair-by-design algorithms, here we consider making a fairness-unaware algorithm's output fairer. Specifically, we explore the area of fairness in clustering by modifying clusterings produced by existing algorithms to make them fairer whilst retaining their quality. We formulate the minimal cluster modification for fairness (MCMF) problem, where the input is a given partitional clustering and the goal is to minimally change it so that the clustering is still of good quality but fairer. We show that for a single binary protected status variable, the problem is efficiently solvable (i.e., in the class P) by proving that the constraint matrix for an integer linear programming formulation is totally unimodular. Interestingly, we show that even for a single protected variable, the addition of simple pairwise guidance for clustering (to say ensure individual-level fairness) makes the MCMF problem computationally intractable (i.e., NP-hard). Experimental results using Twitter, Census and NYT data sets show that our methods can modify existing clusterings for data sets in excess of 100,000 instances within minutes on laptops and find clusterings that are as fair but are of higher quality than those produced by fair-by-design clustering algorithms. Finally, we explore a challenging practical problem of making a historical clustering (i.e., zipcodes clustered into California's congressional districts) fairer using a new multi-faceted benchmark data set.
引用
收藏
页码:1404 / 1440
页数:37
相关论文
共 50 条
  • [31] A WEB APPLICATION FOR POST-PROCESSING SEISMIC HAZARD ASSESSMENT RESULTS
    Alvarez, Leonardo
    Lindholm, Conrad
    NEXO REVISTA CIENTIFICA, 2024, 37 (01): : 4 - 21
  • [32] A novel low-complexity post-processing algorithm for precise QRS localization
    Fonseca, Pedro
    Aarts, Ronald M.
    Foussier, Jerome
    Long, Xi
    SPRINGERPLUS, 2014, 3
  • [33] REDUCED COMPLEXITY MPEG2 VIDEO POST-PROCESSING FOR HD DISPLAY
    Virk, K.
    Li, H.
    Forchhammer, S.
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 769 - 772
  • [34] A Low-complexity Neural Network for Compressed Video Post-processing in HEVC
    Liu, Zheng
    Qi, Honggang
    Han, Yu
    Cui, Guoqin
    Zhang, Yundong
    DCC 2022: 2022 DATA COMPRESSION CONFERENCE (DCC), 2022, : 466 - 466
  • [35] Data Post-Processing Algorithms for Active Forward-Looking Sonar System
    Berdnikova, J.
    Kozevnikov, V.
    Ruuben, T.
    Raja, A.
    ELEKTRONIKA IR ELEKTROTECHNIKA, 2011, (04) : 43 - 46
  • [36] A closer look into sequential clustering algorithms and associated post-processing refinement strategies
    Nicoletti, M.D.C. (carmo@cc.faccamp.br), 1600, Inderscience Enterprises Ltd., 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (06):
  • [37] A post-processing solution to restore numerical consistency for classical flow routing algorithms
    Coatleven, Julien
    Chauveau, Benoit
    COMPUTATIONAL GEOSCIENCES, 2025, 29 (02)
  • [38] Evaluation of Decimeter Positioning Post-Processing Algorithms using GNSS Raw Measurements
    Castel, Belen
    Cortes, Inigo
    van der Merwe, J. Rossouw
    Dietmayer, Katrin
    Ruegamer, Alexander
    Felber, Wolfgang
    PROCEEDINGS OF THE 34TH INTERNATIONAL TECHNICAL MEETING OF THE SATELLITE DIVISION OF THE INSTITUTE OF NAVIGATION (ION GNSS+ 2021), 2021, : 3037 - 3048
  • [39] Post-processing algorithms of 3-TPS (RRR) hybrid machine tool
    Shi, H
    Cai, GQ
    Yang, BJ
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT, VOLS 1 AND 2: INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT IN THE GLOBAL ECONOMY, 2005, : 472 - 475
  • [40] Image analysis algorithms for automating the post-processing of cetacean sighting surveys data
    Privat, B
    Voleau, G
    Lebart, K
    Petillot, Y
    Leaper, R
    Gordon, J
    Oceans 2005 - Europe, Vols 1 and 2, 2005, : 368 - 372