Divide and Imitate: Multi-cluster Identification and Mitigation of Selection Bias

被引:2
|
作者
Dost, Katharina [1 ]
Duncanson, Hamish [1 ]
Ziogas, Ioannis [2 ]
Riddle, Patricia [1 ]
Wicker, Jorg [1 ]
机构
[1] Univ Auckland, Auckland, New Zealand
[2] Univ Mississippi, Oxford, MS USA
关键词
D O I
10.1007/978-3-031-05936-0_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine Learning can help overcome human biases in decision making by focussing on purely logical conclusions based on the training data. If the training data is biased, however, that bias will be transferred to the model and remains undetected as the performance is validated on a test set drawn from the same biased distribution. Existing strategies for selection bias identification and mitigation generally rely on some sort of knowledge of the bias or the ground-truth. An exception is the Imitate algorithm that assumes no knowledge but comes with a strong limitation: It can only model datasets with one normally distributed cluster per class. In this paper, we introduce a novel algorithm, Mimic, which uses Imitate as a building block but relaxes this limitation. By allowing mixtures of multivariate Gaussians, our technique is able to model multi-cluster datasets and provide solutions for a substantially wider set of problems. Experiments confirm that Mimic not only identifies potential biases in multi-cluster datasets which can be corrected early on but also improves classifier performance.
引用
收藏
页码:149 / 160
页数:12
相关论文
共 50 条
  • [31] NOC ARCHITECTURE DESIGN FOR MULTI-CLUSTER CHIPS
    Freitas, Henrique C.
    Navaux, Philippe O. A.
    Santos, Tatiana G. S.
    2008 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE AND LOGIC APPLICATIONS, VOLS 1 AND 2, 2008, : 53 - +
  • [32] Adaptive loop tiling for a multi-cluster CMP
    Zhao, Jisheng
    Horsnell, Matthew
    Lujan, Mikel
    Rogers, Ian
    Kirkham, Chris
    Watson, Ian
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PROCEEDINGS, 2008, 5022 : 220 - 232
  • [33] Multi-Cluster Coordinated Movement and Dynamic Reorganization
    Zhiqing Dang
    Yang Yu
    Zhaopeng Dai
    Long Zhang
    Ang Su
    Zhihang You
    Hongwei Gao
    Computational Mathematics and Mathematical Physics, 2022, 62 : 1955 - 1970
  • [34] Supporting OpenMP on a multi-cluster embedded MPSoC
    Marongiu, Andrea
    Burgio, Paolo
    Benini, Luca
    MICROPROCESSORS AND MICROSYSTEMS, 2011, 35 (08) : 668 - 682
  • [35] Multi-cluster nonlinear unsupervised feature selection via joint manifold learning and generalized Lasso
    Wang, Yadi
    Huang, Mengyao
    Zhou, Liming
    Che, Hangjun
    Jiang, Bingbing
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [36] Gearbox fault diagnosis based on improved multi-scale fluctuation dispersion entropy and multi-cluster feature selection
    Li, Baoyue
    Yu, Yonghua
    Wang, Weicheng
    Zhang, Ning
    Xie, Meiqiang
    TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2024,
  • [37] Computing large-scale alignments on a multi-cluster
    Chen, CX
    Schmidt, B
    IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, PROCEEDINGS, 2003, : 38 - 45
  • [38] Analysis of interconnection networks in heterogeneous multi-cluster systems
    Javadi, Bahman
    Abawajy, Jemal H.
    Akbari, Mohammad K.
    Nahavandi, Saeid
    2006 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, PROCEEDINGS, 2006, : 115 - +
  • [39] Effect of weightage of cluster power in multi-cluster mobile radio channels
    Chen, YF
    Dubey, VK
    APCC 2003: 9TH ASIA-PACIFIC CONFERENCE ON COMMUNICATION, VOLS 1-3, PROCEEDINGS, 2003, : 20 - 24
  • [40] Multi-cluster problems: resonances, scattering and condensed states
    Kato, K.
    Myo, T.
    Kikuchi, Y.
    Yoshida, T.
    10TH INTERNATIONAL CONFERENCE ON CLUSTERING ASPECTS OF NUCLEAR STRUCTURE AND DYNAMICS (CLUSTER'12), 2013, 436