Data preprocessing techniques for classification without discrimination

被引:640
|
作者
Kamiran, Faisal
Calders, Toon
机构
关键词
Classification; Preprocessing; Discrimination-aware data mining;
D O I
10.1007/s10115-011-0463-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, the following Discrimination-Aware Classification Problem was introduced: Suppose we are given training data that exhibit unlawful discrimination; e.g., toward sensitive attributes such as gender or ethnicity. The task is to learn a classifier that optimizes accuracy, but does not have this discrimination in its predictions on test data. This problem is relevant in many settings, such as when the data are generated by a biased decision process or when the sensitive attribute serves as a proxy for unobserved features. In this paper, we concentrate on the case with only one binary sensitive attribute and a two-class classification problem. We first study the theoretically optimal trade-off between accuracy and non-discrimination for pure classifiers. Then, we look at algorithmic solutions that preprocess the data to remove discrimination before a classifier is learned. We survey and extend our existing data preprocessing techniques, being suppression of the sensitive attribute, massaging the dataset by changing class labels, and reweighing or resampling the data to remove discrimination without relabeling instances. These preprocessing techniques have been implemented in a modified version of Weka and we present the results of experiments on real-life data.
引用
收藏
页码:1 / 33
页数:33
相关论文
共 50 条
  • [41] A data preprocessing framework for students' outcome prediction by data mining techniques
    Danubianu, Mirela
    2015 19TH INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2015, : 836 - 841
  • [42] Data Preprocessing and Classification for Taproot Site Data Sets of PANAX NOTOGINSENG
    Huang, Dao
    He, Jin
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION AND CONTROL, 2015, 119 : 131 - 134
  • [43] Data Preprocessing and Dynamic Ensemble Selection for Imbalanced Data Stream Classification
    Zyblewski, Pawel
    Sabourin, Robert
    Wozniak, Michal
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 367 - 379
  • [44] A comprehensive analysis about the influence of low-level preprocessing techniques on mass spectrometry data for sample classification
    Lopez-Fernandez, Hugo
    Reboiro-Jato, Miguel
    Glez-Pena, Daniel
    Fernandez-Riverola, Florentino
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2014, 10 (04) : 455 - 473
  • [45] Implementation and Efficient Analysis of Preprocessing Techniques in Deep Learning for Image Classification
    H., James Deva Koresh
    CURRENT MEDICAL IMAGING, 2024, 20
  • [46] The Effect of Preprocessing Techniques, Applied to Numeric Features, on Classification Algorithms' Performance
    Alshdaifat, Esra'a
    Alshdaifat, Doa'a
    Alsarhan, Ayoub
    Hussein, Fairouz
    El-Salhi, Subhieh Moh'd Faraj S.
    DATA, 2021, 6 (02) : 1 - 23
  • [47] Classification of Motor Tasks from EEG Signals Comparing Preprocessing Techniques
    Kauati-Saito, Eric
    da Silveira, Gustavo F. M.
    Da-Silva, Paulo J. G.
    Miranda de Sa, Antonio Mauricio F. L.
    Tierra-Criollo, Carlos Julio
    XXVI BRAZILIAN CONGRESS ON BIOMEDICAL ENGINEERING, CBEB 2018, VOL. 2, 2019, 70 (02): : 109 - 113
  • [48] Classification of Short Text Using Various Preprocessing Techniques: An Empirical Evaluation
    Kumar, H. M. Keerthi
    Harish, B. S.
    RECENT FINDINGS IN INTELLIGENT COMPUTING TECHNIQUES, VOL 3, 2018, 709 : 19 - 30
  • [49] On the classification techniques in data mining for microarray data classification
    Aydadenta, Husna
    Adiwijaya
    INTERNATIONAL CONFERENCE ON DATA AND INFORMATION SCIENCE (ICODIS), 2018, 971
  • [50] Relational data analysis, discrimination and classification
    Meulman, JJ
    CLASSIFICATION AND INFORMATION PROCESSING AT THE TURN OF THE MILLENNIUM, 2000, : 32 - 39