Data preprocessing techniques for classification without discrimination

被引:640
|
作者
Kamiran, Faisal
Calders, Toon
机构
关键词
Classification; Preprocessing; Discrimination-aware data mining;
D O I
10.1007/s10115-011-0463-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, the following Discrimination-Aware Classification Problem was introduced: Suppose we are given training data that exhibit unlawful discrimination; e.g., toward sensitive attributes such as gender or ethnicity. The task is to learn a classifier that optimizes accuracy, but does not have this discrimination in its predictions on test data. This problem is relevant in many settings, such as when the data are generated by a biased decision process or when the sensitive attribute serves as a proxy for unobserved features. In this paper, we concentrate on the case with only one binary sensitive attribute and a two-class classification problem. We first study the theoretically optimal trade-off between accuracy and non-discrimination for pure classifiers. Then, we look at algorithmic solutions that preprocess the data to remove discrimination before a classifier is learned. We survey and extend our existing data preprocessing techniques, being suppression of the sensitive attribute, massaging the dataset by changing class labels, and reweighing or resampling the data to remove discrimination without relabeling instances. These preprocessing techniques have been implemented in a modified version of Weka and we present the results of experiments on real-life data.
引用
收藏
页码:1 / 33
页数:33
相关论文
共 50 条
  • [1] Data preprocessing techniques for classification without discrimination
    Faisal Kamiran
    Toon Calders
    Knowledge and Information Systems, 2012, 33 : 1 - 33
  • [2] Classification and Preprocessing in the Stock Data
    Juszczuk, Przemyslaw
    Kozak, Jan
    BUSINESS INFORMATION SYSTEMS WORKSHOPS, BIS 2017, 2017, 303 : 269 - 281
  • [3] Improved preprocessing and data clustering for landmine discrimination
    Mereddy, P
    Agarwal, S
    Rao, V
    DETECTION AND REMEDIATION TECHNOLOGIES FOR MINES AND MINELIKE TARGETS V, PTS 1 AND 2, 2000, 4038 : 1341 - 1351
  • [4] A survey on preprocessing and classification techniques for acoustic scene
    Singh, Vikash Kumar
    Sharma, Kalpana
    Sur, Samarendra Nath
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 229
  • [5] Evaluation of preprocessing techniques for chief complaint classification
    Dara, Jagan
    Dowling, John N.
    Travers, Debbie
    Cooper, Gregory F.
    Chapman, Wendy W.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2008, 41 (04) : 613 - 623
  • [6] A method for evaluating data-preprocessing techniques for odor classification with an array of gas sensors
    Gutierrez-Osuna, R
    Nagle, HT
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1999, 29 (05): : 626 - 632
  • [7] MC: a Unsupervised Data Preprocessing for Classification
    Hu, Enliang
    Chen, Songcan
    Yin, Xuesong
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL I, PROCEEDINGS, 2008, : 259 - 263
  • [8] An individualized preprocessing for medical data classification
    AlMuhaideb, Sarab
    Menai, Mohamed El Bachir
    4TH SYMPOSIUM ON DATA MINING APPLICATIONS (SDMA2016), 2016, 82 : 35 - 42
  • [9] Impact of preprocessing on medical data classification
    Sarab ALMUHAIDEB
    Mohamed El Bachir MENAI
    Frontiers of Computer Science, 2016, 10 (06) : 1082 - 1102
  • [10] Impact of preprocessing on medical data classification
    Sarab Almuhaideb
    Mohamed El Bachir Menai
    Frontiers of Computer Science, 2016, 10 : 1082 - 1102