Counterfactual Data-Fusion for Online Reinforcement Learners

被引:0
|
作者
Forney, Andrew [1 ]
Pearl, Judea [1 ]
Bareinboim, Elias [2 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Purdue Univ, W Lafayette, IN 47907 USA
基金
美国国家科学基金会;
关键词
MULTIARMED BANDIT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Multi-Armed Bandit problem with Unobserved Confounders (MABUC) considers decision-making settings where unmeasured variables can influence both the agent's decisions and received rewards (Bareinboim et al., 2015). Recent findings showed that unobserved confounders (UCs) pose a unique challenge to algorithms based on standard randomization (i.e., experimental data); if UCs are naively averaged out, these algorithms behave sub-optimally, possibly incurring infinite regret. In this paper, we show how counterfactual-based decision-making circumvents these problems and leads to a coherent fusion of observational and experimental data. We then demonstrate this new strategy in an enhanced Thompson Sampling bandit player, and support our findings' efficacy with extensive simulations.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] A hybrid data-fusion system using modal data and probabilistic neural network for damage detection
    Jiang, Shao-Fei
    Fu, Chun
    Zhang, Chunming
    [J]. ADVANCES IN ENGINEERING SOFTWARE, 2011, 42 (06) : 368 - 374
  • [42] Data-fusion for robust off-road perception considering data quality of uncertain sensors
    Wolf, Patrick
    Berns, Karsten
    [J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 6876 - 6883
  • [43] Implementation-Oriented Model for Centralized Data-Fusion Cooperative Spectrum Sensing
    Guimaraes, Dayan Adionel
    Amaral de Souza, Rausley Adriano
    [J]. IEEE COMMUNICATIONS LETTERS, 2012, 16 (11) : 1804 - 1807
  • [44] Performance evaluation of multi-sensor data-fusion systems in launch vehicles
    B. N. Suresh
    K. Sivan
    [J]. Sadhana, 2004, 29 : 175 - 188
  • [45] Data-fusion display system with volume rendering of intraoperatively scanned CT images
    Hayashibe, M
    Suzuki, N
    Hattori, A
    Otake, Y
    Suzuki, S
    Nakata, N
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2005, PT 2, 2005, 3750 : 559 - 566
  • [46] Observability based data-fusion cascading filtering for urban network flow estimation
    Rinaldi, Marco
    Viti, Francesco
    [J]. 2021 7TH INTERNATIONAL CONFERENCE ON MODELS AND TECHNOLOGIES FOR INTELLIGENT TRANSPORTATION SYSTEMS (MT-ITS), 2021,
  • [47] Performance evaluation of multi-sensor data-fusion systems in launch vehicles
    Suresh, BN
    Sivan, K
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2004, 29 (2): : 175 - 188
  • [48] A Hybrid Data-Fusion Estimate Method for Health Status of Train Braking System
    Liu, Hang
    Peng, Jun
    Gao, Dianzhu
    Yang, Yingze
    Wang, Shengnan
    Fan, Yunsheng
    Hu, Chao
    Zhang, Xiaoyong
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 2130 - 2135
  • [49] Intrinsically motivated reinforcement learning based recommendation with counterfactual data augmentation
    Xiaocong Chen
    Siyu Wang
    Lianyong Qi
    Yong Li
    Lina Yao
    [J]. World Wide Web, 2023, 26 : 3253 - 3274
  • [50] A physical/statistical data-fusion for the dynamical downscaling of GRACE data at daily and 1 km resolution
    Pellet, Victor
    Aires, Filipe
    Alfieri, Lorenzo
    Bruno, Giulia
    [J]. JOURNAL OF HYDROLOGY, 2024, 628