Sample selection bias in evaluation of prediction performance of causal models

被引:3
|
作者
Long, James P. [1 ]
Ha, Min Jin [1 ]
机构
[1] Univ Texas MD Anderson Canc Ctr, Dept Biostat, Houston, TX 77030 USA
关键词
causal inference; genetic perturbation experiments; prediction; sample selection bias; INFERENCE;
D O I
10.1002/sam.11559
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Causal models are notoriously difficult to validate because they make untestable assumptions regarding confounding. New scientific experiments offer the possibility of evaluating causal models using prediction performance. Prediction performance measures are typically robust to violations in causal assumptions. However, prediction performance does depend on the selection of training and test sets. Biased training sets can lead to optimistic assessments of model performance. In this work, we revisit the prediction performance of several recently proposed causal models tested on a genetic perturbation data set of Kemmeren. We find that sample selection bias is likely a key driver of model performance. We propose using a less-biased evaluation set for assessing prediction performance and compare models on this new set. In this setting, the causal models have similar or worse performance compared to standard association-based estimators such as Lasso. Finally, we compare the performance of causal estimators in simulation studies that reproduce the Kemmeren structure of genetic knockout experiments but without any sample selection bias. These results provide an improved understanding of the performance of several causal models and offer guidance on how future studies should use Kemmeren.
引用
收藏
页码:5 / 14
页数:10
相关论文
共 50 条
  • [1] Causal Feature Selection in the Presence of Sample Selection Bias
    Yang, Shuai
    Guo, Xianjie
    Yu, Kui
    Huang, Xiaoling
    Jiang, Tingting
    He, Jin
    Gu, Lichuan
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (05)
  • [2] MODELS FOR SAMPLE SELECTION BIAS
    WINSHIP, C
    MARE, RD
    [J]. ANNUAL REVIEW OF SOCIOLOGY, 1992, 18 : 327 - 350
  • [3] Heterogeneous Causal Effects and Sample Selection Bias
    Breen, Richard
    Choi, Seongsoo
    Holm, Anders
    [J]. SOCIOLOGICAL SCIENCE, 2015, 2 : 351 - 369
  • [4] Sample selection bias in models of commuting time
    Cooke, TJ
    Ross, SL
    [J]. URBAN STUDIES, 1999, 36 (09) : 1597 - 1611
  • [5] Sample selection bias in credit scoring models
    Banasik, J
    Crook, J
    Thomas, L
    [J]. JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2003, 54 (08) : 822 - 832
  • [6] Estimating models with sample selection bias: A survey
    Vella, F
    [J]. JOURNAL OF HUMAN RESOURCES, 1998, 33 (01) : 127 - 169
  • [7] Exploring Selection Bias by Causal Frailty Models The Magnitude Matters
    Stensrud, Mats Julius
    Valberg, Morten
    Roysland, Kjetil
    Aalena, Odd O.
    [J]. EPIDEMIOLOGY, 2017, 28 (03) : 379 - 386
  • [8] Sample selection in models of academic performance
    Cushing, MJ
    McGarvey, MG
    [J]. ECONOMIC INQUIRY, 2004, 42 (02) : 319 - 322
  • [9] The performance of sample selection estimators to control for attrition bias
    Grasdal, A
    [J]. HEALTH ECONOMICS, 2001, 10 (05) : 385 - 398
  • [10] Sample selection bias in acquisition credit scoring models: an evaluation of the supplemental-data approach
    Barakova, Irina
    Glennon, Dennis
    Palvia, Ajay
    [J]. JOURNAL OF CREDIT RISK, 2013, 9 (03): : 77 - 117