Data preprocessing techniques for classification without discrimination

被引:640
|
作者
Kamiran, Faisal
Calders, Toon
机构
关键词
Classification; Preprocessing; Discrimination-aware data mining;
D O I
10.1007/s10115-011-0463-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, the following Discrimination-Aware Classification Problem was introduced: Suppose we are given training data that exhibit unlawful discrimination; e.g., toward sensitive attributes such as gender or ethnicity. The task is to learn a classifier that optimizes accuracy, but does not have this discrimination in its predictions on test data. This problem is relevant in many settings, such as when the data are generated by a biased decision process or when the sensitive attribute serves as a proxy for unobserved features. In this paper, we concentrate on the case with only one binary sensitive attribute and a two-class classification problem. We first study the theoretically optimal trade-off between accuracy and non-discrimination for pure classifiers. Then, we look at algorithmic solutions that preprocess the data to remove discrimination before a classifier is learned. We survey and extend our existing data preprocessing techniques, being suppression of the sensitive attribute, massaging the dataset by changing class labels, and reweighing or resampling the data to remove discrimination without relabeling instances. These preprocessing techniques have been implemented in a modified version of Weka and we present the results of experiments on real-life data.
引用
收藏
页码:1 / 33
页数:33
相关论文
共 50 条
  • [31] Preprocessing techniques for context recognition from accelerometer data
    Davide Figo
    Pedro C. Diniz
    Diogo R. Ferreira
    João M. P. Cardoso
    Personal and Ubiquitous Computing, 2010, 14 : 645 - 662
  • [32] Imbalanced data preprocessing model for web service classification
    Rhmann, Wasiur
    Ishrat, Amaan
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (10) : 4825 - 4837
  • [33] PREPROCESSING FOR CLASSIFICATION OF SPARSE DATA: APPLICATION TO TRAJECTORY RECOGNITION
    Mayoue, A.
    Barthelemy, Q.
    Onis, S.
    Larue, A.
    2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 37 - 40
  • [34] Systematic literature review of preprocessing techniques for imbalanced data
    Felix, Ebubeogu Amarachukwu
    Lee, Sai Peck
    IET SOFTWARE, 2019, 13 (06) : 479 - 496
  • [35] Data preprocessing in semi-supervised SVM classification
    Astorino, A.
    Gorgone, E.
    Gaudioso, M.
    Pallaschke, D.
    OPTIMIZATION, 2011, 60 (1-2) : 143 - 151
  • [36] Advanced data preprocessing using fuzzy clustering techniques
    Genther, H
    Glesner, M
    FUZZY SETS AND SYSTEMS, 1997, 85 (02) : 155 - 164
  • [37] Preprocessing time series data for classification with application to CRM
    Yang, YM
    Yang, Q
    Lu, W
    Pan, JL
    Pan, R
    Lu, CH
    Li, L
    Qin, ZX
    AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 : 133 - 142
  • [38] Preprocessing of GPR data for syntactic landmine detection and classification
    Nasif, Ahmed O.
    Hintz, Kenneth J.
    Peixoto, Nathalia
    DETECTION AND SENSING OF MINES, EXPLOSIVE OBJECTS, AND OBSCURED TARGETS XV, 2010, 7664
  • [39] Hyperspectral data preprocessing to improve performance of classification algorithms
    Subramanian, S
    Gat, N
    Barhen, J
    IMAGING SPECTROMETRY III, 1997, 3118 : 232 - 240
  • [40] On the Analysis of Work Accidents Data by Using Data Preprocessing and Statistical Techniques
    Aksehir, Zinnet Duygu
    Oruc, Yalcin
    Elibol, Ahmet
    Akleylek, Sedat
    Kilic, Erdal
    2018 2ND INTERNATIONAL SYMPOSIUM ON MULTIDISCIPLINARY STUDIES AND INNOVATIVE TECHNOLOGIES (ISMSIT), 2018, : 649 - +