Boosted Off-Policy Learning

被引:0
|
作者
London, Ben [1 ]
Lu, Levi [1 ]
Sandler, Ted [2 ]
Joachims, Thorsten [1 ]
机构
[1] Amazon, Seattle, WA 98109 USA
[2] Groundlight AI, Seattle, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose the first boosting algorithm for off-policy learning from logged bandit feedback. Unlike existing boosting methods for supervised learning, our algorithm directly optimizes an estimate of the policy's expected reward. We analyze this algorithm and prove that the excess empirical risk decreases (possibly exponentially fast) with each round of boosting, provided a "weak" learning condition is satisfied by the base learner. We further show how to reduce the base learner to supervised learning, which opens up a broad range of readily available base learners with practical benefits, such as decision trees. Experiments indicate that our algorithm inherits many desirable properties of tree-based boosting algorithms (e.g., robustness to feature scaling and hyperparameter tuning), and that it can outperform off-policy learning with deep neural networks as well as methods that simply regress on the observed rewards.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] Batch Reinforcement Learning With a Nonparametric Off-Policy Policy Gradient
    Tosatto, Samuele
    Carvalho, Joao
    Peters, Jan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 5996 - 6010
  • [32] Off-policy and on-policy reinforcement learning with the Tsetlin machine
    Gorji, Saeed Rahimi
    Granmo, Ole-Christoffer
    APPLIED INTELLIGENCE, 2023, 53 (08) : 8596 - 8613
  • [33] Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
    Zhang, Zeyu
    Su, Yi
    Yuan, Hui
    Wu, Yiran
    Balasubramanian, Rishab
    Wu, Qingyun
    Wang, Huazheng
    Wang, Mengdi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [34] Off-policy Imitation Learning from Visual Inputs
    Cheng, Zhihao
    Shen, Li
    Tao, Dacheng
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 2937 - 2943
  • [35] Towards Robust Off-Policy Learning for Runtime Uncertainty
    Xu, Da
    Ye, Yuting
    Ruan, Chuanwei
    Yang, Bo
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10101 - 10109
  • [36] Approximation spaces in off-policy Monte Carlo learning
    Peters, James F.
    Henry, Christopher
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2007, 20 (05) : 667 - 675
  • [37] Scaling Life-long Off-policy Learning
    White, Adam
    Modayil, Joseph
    Sutton, Richard S.
    2012 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL), 2012,
  • [38] Counterfactual experience augmented off-policy reinforcement learning
    Lee, Sunbowen
    Gong, Yicheng
    Deng, Chao
    NEUROCOMPUTING, 2025, 637
  • [39] Off-policy Learning over Heterogeneous Information for Recommendation
    Wang, Xiangmeng
    Li, Qian
    Yu, Dianer
    Xu, Guandong
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 2348 - 2359
  • [40] Flexible Data Augmentation in Off-Policy Reinforcement Learning
    Rak, Alexandra
    Skrynnik, Alexey
    Panov, Aleksandr I.
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2021), PT I, 2021, 12854 : 224 - 235