Divide and Imitate: Multi-cluster Identification and Mitigation of Selection Bias

被引:2
|
作者
Dost, Katharina [1 ]
Duncanson, Hamish [1 ]
Ziogas, Ioannis [2 ]
Riddle, Patricia [1 ]
Wicker, Jorg [1 ]
机构
[1] Univ Auckland, Auckland, New Zealand
[2] Univ Mississippi, Oxford, MS USA
关键词
D O I
10.1007/978-3-031-05936-0_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine Learning can help overcome human biases in decision making by focussing on purely logical conclusions based on the training data. If the training data is biased, however, that bias will be transferred to the model and remains undetected as the performance is validated on a test set drawn from the same biased distribution. Existing strategies for selection bias identification and mitigation generally rely on some sort of knowledge of the bias or the ground-truth. An exception is the Imitate algorithm that assumes no knowledge but comes with a strong limitation: It can only model datasets with one normally distributed cluster per class. In this paper, we introduce a novel algorithm, Mimic, which uses Imitate as a building block but relaxes this limitation. By allowing mixtures of multivariate Gaussians, our technique is able to model multi-cluster datasets and provide solutions for a substantially wider set of problems. Experiments confirm that Mimic not only identifies potential biases in multi-cluster datasets which can be corrected early on but also improves classifier performance.
引用
收藏
页码:149 / 160
页数:12
相关论文
共 50 条
  • [21] Multi-Cluster Coordinated Movement and Dynamic Reorganization
    Dang, Zhiqing
    Yu, Yang
    Dai, Zhaopeng
    Zhang, Long
    Su, Ang
    You, Zhihang
    Gao, Hongwei
    COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2022, 62 (11) : 1955 - 1970
  • [22] A Multi-Cluster Tracking Algorithm with an Event Camera
    Aladem, Mohamed
    Rawashdeh, Samir A.
    PROCEEDINGS OF THE 2019 IEEE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE (NAECON), 2019, : 391 - 397
  • [23] Coded Computing for Multi-Cluster Distributed Computations
    Wu, Youlong
    Li, Chenglin
    Hu, Haoyang
    Song, Xiyu
    Ma, Shuai
    Shi, Yuanming
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2025, 73 (02) : 1114 - 1127
  • [24] A generic channel model in multi-cluster environments
    Chen, YF
    Dubey, VK
    57TH IEEE VEHICULAR TECHNOLOGY CONFERENCE, VTC 2003-SPRING, VOLS 1-4, PROCEEDINGS, 2003, : 217 - 221
  • [25] Security automation for multi-cluster orchestration in Kubernetes
    Bringhenti, Daniele
    Sisto, Riccardo
    Valenza, Fulvio
    2023 IEEE 9TH INTERNATIONAL CONFERENCE ON NETWORK SOFTWARIZATION, NETSOFT, 2023, : 480 - 485
  • [26] Cooperative Throughput Maximization in a Multi-Cluster WPCN
    Rezaei, Omid
    Masjedi, Maryam
    Naghsh, Mohammad Mahdi
    Gazor, Saeed
    Mahdi Nayebi, Mohammad
    IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, 2024, 8 (04): : 1505 - 1520
  • [27] Interconnection Optimization for Multi-Cluster Avionics Networks
    Ayed, H.
    Mifdaoui, A.
    Fraboul, C.
    PROCEEDINGS OF THE 2013 25TH EUROMICRO CONFERENCE ON REAL-TIME SYSTEMS (ECRTS 2013), 2013, : 145 - 154
  • [28] A stochastic multi-cluster model of freeway traffic
    Kaupuzs, J
    Mahnke, R
    TRAFFIC AND GRANULAR FLOW'99: SOCIAL, TRAFFIC, AND GRANULAR DYNAMICS, 2000, : 449 - 454
  • [29] A stochastic multi-cluster model of freeway traffic
    Kaupuzs, J
    Mahnke, R
    EUROPEAN PHYSICAL JOURNAL B, 2000, 14 (04): : 793 - 800
  • [30] A stochastic multi-cluster model of freeway traffic
    J. Kaupužs
    R. Mahnke
    The European Physical Journal B, 2000, 14 : 793 - 800