Post-Contextual-Bandit Inference

被引:0
|
作者
Bibaut, Aurelien [1 ]
Dimakopoulou, Maria [1 ]
Kallus, Nathan [1 ,2 ]
Chambaz, Antoine [3 ]
van der Laan, Mark [4 ]
机构
[1] Netflix, Los Gatos, CA 95032 USA
[2] Cornell Univ, Ithaca, NY 14853 USA
[3] Univ Paris, Paris, France
[4] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking because they can both improve outcomes for study participants and increase the chance of identifying good or even best policies. To support credible inference on novel interventions at the end of the study, nonetheless, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies. The adaptive nature of the data collected by contextual bandit algorithms, however, makes this difficult: standard estimators are no longer asymptotically normally distributed and classic confidence intervals fail to provide correct coverage. While this has been addressed in non-contextual settings by using stabilized estimators, variance-stabilized estimators in the contextual setting pose unique challenges that we tackle for the first time in this paper. We propose the Contextual Adaptive Doubly Robust (CADR) estimator, a novel estimator for policy value that is asymptotically normal under contextual adaptive data collection. The main technical challenge in constructing CADR is designing adaptive and consistent conditional standard deviation estimators for stabilization. Extensive numerical experiments using 57 OpenML datasets demonstrate that confidence intervals based on CADR uniquely provide correct coverage.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Variational inference for the multi-armed contextual bandit
    Urteaga, Inigo
    Wiggins, Chris H.
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [2] Statistical Inference for Online Decision Making: In a Contextual Bandit Setting
    Chen, Haoyu
    Lu, Wenbin
    Song, Rui
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (533) : 240 - 255
  • [3] Encrypted Linear Contextual Bandit
    Garcelon, Evrard
    Perchet, Vianney
    Pirotta, Matteo
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [4] Contextual combinatorial bandit on portfolio management
    Ni, He
    Xu, Hao
    Ma, Dan
    Fan, Jun
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 221
  • [5] A Contextual Bandit Approach to Dynamic Search
    Yang, Angela
    Yang, Grace Hui
    [J]. ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 301 - 304
  • [6] Adaptive Exploration in Linear Contextual Bandit
    Hao, Botao
    Lattimore, Tor
    Szepesvari, Csaba
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 3536 - 3544
  • [7] A Contextual Bandit Bake-off
    Bietti, Alberto
    Agarwal, Alekh
    Langford, John
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [8] Conversational Contextual Bandit: Algorithm and Application
    Zhang, Xiaoying
    Xie, Hong
    Li, Hang
    Lui, John C. S.
    [J]. WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 662 - 672
  • [9] A contextual bandit bake-off
    Bietti, Alberto
    Agarwal, Alekh
    Langford, John
    [J]. Journal of Machine Learning Research, 2021, 22
  • [10] BANDITSUM: Extractive Summarization as a Contextual Bandit
    Dong, Yue
    Shen, Yikang
    Crawford, Eric
    van Hoof, Herke
    Cheung, Jackie C. K.
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3739 - 3748