Addressing sample selection bias for machine learning methods

被引:0
|
作者
Brewer, Dylan [1 ]
Carlson, Alyssa [2 ]
机构
[1] Georgia Inst Technol, Sch Econ, Atlanta, GA 30332 USA
[2] Univ Missouri, Dept Econ, Columbia, MO USA
关键词
control function; inverse probability weighting; machine learning; sample selection; SEMIPARAMETRIC REGRESSION-MODELS; INCUMBENCY ADVANTAGE; COVARIATE SHIFT; BIG DATA; INFERENCE; ACCOUNTABILITY;
D O I
10.1002/jae.3029
中图分类号
F [经济];
学科分类号
02 ;
摘要
We study approaches for adjusting machine learning methods when the training sample differs from the prediction sample on unobserved dimensions. The machine learning literature predominately assumes selection only on observed dimensions. Common approaches are to weight or include variables that influence selection as solutions to selection on observables. Simulation results show that selection on unobservables increases mean squared prediction error using popular machine-learning algorithms. Common machine learning practices such as weighting or including variables that influence selection into the training or prediction sample often worsen sample selection bias. We propose two control function approaches that remove the effects of selection bias before training and find that they reduce mean-squared prediction error in simulations. We apply these approaches to predicting the vote share of the incumbent in gubernatorial elections using previously observed re-election bids. We find that ignoring selection on unobservables leads to substantially higher predicted vote shares for the incumbent than when the control function approach is used.
引用
收藏
页码:383 / 400
页数:18
相关论文
共 50 条
  • [1] Sample size selection in optimization methods for machine learning
    Byrd, Richard H.
    Chin, Gillian M.
    Nocedal, Jorge
    Wu, Yuchen
    [J]. MATHEMATICAL PROGRAMMING, 2012, 134 (01) : 127 - 155
  • [2] Sample size selection in optimization methods for machine learning
    Richard H. Byrd
    Gillian M. Chin
    Jorge Nocedal
    Yuchen Wu
    [J]. Mathematical Programming, 2012, 134 : 127 - 155
  • [3] Addressing machine learning bias to foster energy justice
    Chen, Chien-fei
    Napolitano, Rebecca
    Hu, Yuqing
    Kar, Bandana
    Yao, Bing
    [J]. ENERGY RESEARCH & SOCIAL SCIENCE, 2024, 116
  • [4] A selection of challenges in addressing selection bias
    Howards, Penelope P.
    Johnson, Candice Y.
    [J]. PAEDIATRIC AND PERINATAL EPIDEMIOLOGY, 2024, 38 (07) : 638 - 640
  • [5] Addressing maximization bias in reinforcement learning with two-sample testing
    Waltz, Martin
    Okhrin, Ostap
    [J]. ARTIFICIAL INTELLIGENCE, 2024, 336
  • [6] Double Machine Learning for Sample Selection Models
    Bia, Michela
    Huber, Martin
    Laffers, Lukas
    [J]. JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2024, 42 (03) : 958 - 969
  • [7] Selection principle for machine learning methods
    Univ of Sussex, Brighton, United Kingdom
    [J]. Neural Network World, 2 (231-239):
  • [8] SELECTIVITY BIAS CORRECTION METHODS IN POLYCHOTOMOUS SAMPLE SELECTION MODELS
    SCHMERTMANN, CP
    [J]. JOURNAL OF ECONOMETRICS, 1994, 60 (1-2) : 101 - 132
  • [9] Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics in Machine Learning
    Barbosa, Nata M.
    Chen, Monchu
    [J]. CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
  • [10] Learning With Imbalanced Noisy Data by Preventing Bias in Sample Selection
    Liu, Huafeng
    Sheng, Mengmeng
    Sun, Zeren
    Yao, Yazhou
    Hua, Xian-Sheng
    Shen, Heng-Tao
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7426 - 7437