An improved categorization of classifier's sensitivity on sample selection bias

被引:0
|
作者
Fan, W [1 ]
Davidson, I [1 ]
Zadrozny, B [1 ]
Yu, PS [1 ]
机构
[1] IBM Corp, TJ Watson Res Ctr, Hawthorne, NY 10532 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A recent paper categorizes classifier learning algorithms according to their sensitivity, to a common type of sample selection bias where the chance of an example being selected into the training sample depends on its feature vector x but not (directly) on its class label y. A classifier learner is categorized as "local" if it is insensitive to this type of sample selection bias, otherwise, it is considered "global". In that paper the trite model is not clearly distinguished from the model that the algorithm outputs. In their discussion of Bayesian classifiers, logistic regression and hard-margin SVMs, the true model (or the model that generates the trite class label for every example) is implicitly assumed to be contained in the model space of the learner and the trite class probabilities and model estimated class probabilities are assumed to asymptotically converge as the training data set size increases. However in the discussion of naive Bayes, decision frees and soft-margin SVMs, the model space is assumed not to contain the true model, and these three algorithms are instead argued to be "global learners". We argue that most classifier learners may or may not be affected by sample selection bias; this depends on the dataset as well as the heuristics or inductive bias implied by the learning algorithm and their appropriateness to the particular dataset.
引用
收藏
页码:605 / 608
页数:4
相关论文
共 50 条
  • [1] An improved centroid classifier for text categorization
    Tan, Songbo
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (1-2) : 279 - 285
  • [2] An Improved Random Forest Classifier for Text Categorization
    Xu, Baoxun
    Guo, Xiufeng
    Ye, Yunming
    Cheng, Jiefeng
    [J]. JOURNAL OF COMPUTERS, 2012, 7 (12) : 2913 - 2920
  • [3] Fuzziness based sample categorization for classifier performance improvement
    Wang, Xi-Zhao
    Ashfaq, Rana Aamir Raza
    Fu, Ai-Min
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2015, 29 (03) : 1185 - 1196
  • [4] TESTING FOR SAMPLE SELECTION BIAS
    MELINO, A
    [J]. REVIEW OF ECONOMIC STUDIES, 1982, 49 (01): : 151 - 153
  • [5] MODELS FOR SAMPLE SELECTION BIAS
    WINSHIP, C
    MARE, RD
    [J]. ANNUAL REVIEW OF SOCIOLOGY, 1992, 18 : 327 - 350
  • [6] RECURSIVE AUTOMATIC BIAS SELECTION FOR CLASSIFIER CONSTRUCTION
    BRODLEY, CE
    [J]. MACHINE LEARNING, 1995, 20 (1-2) : 63 - 94
  • [7] SAMPLE SELECTION BIAS AS A SPECIFICATION ERROR
    HECKMAN, JJ
    [J]. ECONOMETRICA, 1979, 47 (01) : 153 - 161
  • [8] YTS, EMPLOYMENT, AND SAMPLE SELECTION BIAS
    OHIGGINS, N
    [J]. OXFORD ECONOMIC PAPERS-NEW SERIES, 1994, 46 (04): : 605 - 628
  • [9] Military Technology and Sample Selection Bias
    Fourie, Johan
    Inwood, Kris
    Mariotti, Martine
    [J]. SOCIAL SCIENCE HISTORY, 2020, 44 (03) : 485 - 500
  • [10] Sample Selection Bias Correction Theory
    Cortes, Corinna
    Mohri, Mehryar
    Riley, Michael
    Rostamizadeh, Afshin
    [J]. ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 38 - +