Divide and Imitate: Multi-cluster Identification and Mitigation of Selection Bias

被引:2
|
作者
Dost, Katharina [1 ]
Duncanson, Hamish [1 ]
Ziogas, Ioannis [2 ]
Riddle, Patricia [1 ]
Wicker, Jorg [1 ]
机构
[1] Univ Auckland, Auckland, New Zealand
[2] Univ Mississippi, Oxford, MS USA
关键词
D O I
10.1007/978-3-031-05936-0_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine Learning can help overcome human biases in decision making by focussing on purely logical conclusions based on the training data. If the training data is biased, however, that bias will be transferred to the model and remains undetected as the performance is validated on a test set drawn from the same biased distribution. Existing strategies for selection bias identification and mitigation generally rely on some sort of knowledge of the bias or the ground-truth. An exception is the Imitate algorithm that assumes no knowledge but comes with a strong limitation: It can only model datasets with one normally distributed cluster per class. In this paper, we introduce a novel algorithm, Mimic, which uses Imitate as a building block but relaxes this limitation. By allowing mixtures of multivariate Gaussians, our technique is able to model multi-cluster datasets and provide solutions for a substantially wider set of problems. Experiments confirm that Mimic not only identifies potential biases in multi-cluster datasets which can be corrected early on but also improves classifier performance.
引用
收藏
页码:149 / 160
页数:12
相关论文
共 50 条
  • [1] DendroX: multi-level multi-cluster selection in dendrograms
    Feiling Feng
    Qiaonan Duan
    Xiaoqing Jiang
    Xiaoming Kao
    Dadong Zhang
    BMC Genomics, 25
  • [2] DendroX: multi-level multi-cluster selection in dendrograms
    Feng, Feiling
    Duan, Qiaonan
    Jiang, Xiaoqing
    Kao, Xiaoming
    Zhang, Dadong
    BMC GENOMICS, 2024, 25 (01)
  • [3] Multi-Cluster Feature Selection Based on Isometric Mapping
    Yadi Wang
    Zefeng Zhang
    Yinghao Lin
    IEEE/CAAJournalofAutomaticaSinica, 2022, 9 (03) : 570 - 572
  • [4] Multi-Cluster Feature Selection Based on Isometric Mapping
    Wang, Yadi
    Zhang, Zefeng
    Lin, Yinghao
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (03) : 570 - 572
  • [5] Efficient multi-cluster feature selection on text data
    Gupta, Ananya
    Begum, Shahin Ara
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (08): : 1583 - 1598
  • [6] Application of clustering and multi-cluster selection in SoftMan's perception system
    Mi, Aizhong
    Zheng, Xuefeng
    Tu, Xuyan
    ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 2, PROCEEDINGS, 2007, : 134 - +
  • [7] A Comparative Study on Feature Selection Techniques for Multi-cluster Text Data
    Gupta, Ananya
    Begum, Shahin Ara
    HARMONY SEARCH AND NATURE INSPIRED OPTIMIZATION ALGORITHMS, 2019, 741 : 203 - 215
  • [8] Feature Selection on Data Stream via Multi-Cluster structure Preservation
    Ma, Rui
    Wang, Yijie
    Cheng, Li
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 1065 - 1074
  • [9] Unsupervised Feature Selection for Multi-cluster Data via Smooth Distributed Score
    Liu, Furui
    Liu, Xiyan
    EMERGING INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, 2012, 304 : 74 - +
  • [10] Optimizing Service Selection and Load Balancing in Multi-Cluster Microservice Systems with MCOSS
    Bachar, Daniel
    Bremler-Barr, Anat
    Hay, David
    2023 IFIP NETWORKING CONFERENCE, IFIP NETWORKING, 2023,