Boosted Off-Policy Learning

被引:0
|
作者
London, Ben [1 ]
Lu, Levi [1 ]
Sandler, Ted [2 ]
Joachims, Thorsten [1 ]
机构
[1] Amazon, Seattle, WA 98109 USA
[2] Groundlight AI, Seattle, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose the first boosting algorithm for off-policy learning from logged bandit feedback. Unlike existing boosting methods for supervised learning, our algorithm directly optimizes an estimate of the policy's expected reward. We analyze this algorithm and prove that the excess empirical risk decreases (possibly exponentially fast) with each round of boosting, provided a "weak" learning condition is satisfied by the base learner. We further show how to reduce the base learner to supervised learning, which opens up a broad range of readily available base learners with practical benefits, such as decision trees. Experiments indicate that our algorithm inherits many desirable properties of tree-based boosting algorithms (e.g., robustness to feature scaling and hyperparameter tuning), and that it can outperform off-policy learning with deep neural networks as well as methods that simply regress on the observed rewards.
引用
收藏
页数:27
相关论文
共 50 条
  • [41] Off-Policy Deep Reinforcement Learning without Exploration
    Fujimoto, Scott
    Meger, David
    Precup, Doina
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [42] Modified Retrace for Off-Policy Temporal Difference Learning
    Chen, Xingguo
    Ma, Xingzhou
    Li, Yang
    Yang, Guang
    Yang, Shangdong
    Gao, Yang
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 303 - 312
  • [43] Off-Policy Temporal Difference Learning with Bellman Residuals
    Yang, Shangdong
    Sun, Dingyuanhao
    Chen, Xingguo
    MATHEMATICS, 2024, 12 (22)
  • [44] Safe Optimal Design with Applications in Off-Policy Learning
    Zhu, Ruihao
    Kveton, Branislav
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [45] Interpretable Off-Policy Learning via Hyperbox Search
    Tschernutter, Daniel
    Hatt, Tobias
    Feuerriegel, Stefan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [46] Pessimistic Reward Models for Off-Policy Learning in Recommendation
    Jeunen, Olivier
    Goethals, Bart
    15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, : 63 - 74
  • [47] Mixed experience sampling for off-policy reinforcement learning
    Yu, Jiayu
    Li, Jingyao
    Lu, Shuai
    Han, Shuai
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 251
  • [48] Off-policy Learning for Remote Electrical Tilt Optimization
    Vannella, Filippo
    Jeong, Jaeseong
    Proutiere, Alexandre
    2020 IEEE 92ND VEHICULAR TECHNOLOGY CONFERENCE (VTC2020-FALL), 2020,
  • [49] Adaptive Trade-Offs in Off-Policy Learning
    Rowland, Mark
    Dabney, Will
    Munos, Remi
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 34 - 43
  • [50] Off-Policy Reinforcement Learning for H∞ Control Design
    Luo, Biao
    Wu, Huai-Ning
    Huang, Tingwen
    IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (01) : 65 - 76