Boosted Off-Policy Learning

被引:0
|
作者
London, Ben [1 ]
Lu, Levi [1 ]
Sandler, Ted [2 ]
Joachims, Thorsten [1 ]
机构
[1] Amazon, Seattle, WA 98109 USA
[2] Groundlight AI, Seattle, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose the first boosting algorithm for off-policy learning from logged bandit feedback. Unlike existing boosting methods for supervised learning, our algorithm directly optimizes an estimate of the policy's expected reward. We analyze this algorithm and prove that the excess empirical risk decreases (possibly exponentially fast) with each round of boosting, provided a "weak" learning condition is satisfied by the base learner. We further show how to reduce the base learner to supervised learning, which opens up a broad range of readily available base learners with practical benefits, such as decision trees. Experiments indicate that our algorithm inherits many desirable properties of tree-based boosting algorithms (e.g., robustness to feature scaling and hyperparameter tuning), and that it can outperform off-policy learning with deep neural networks as well as methods that simply regress on the observed rewards.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Learning with Options that Terminate Off-Policy
    Harutyunyan, Anna
    Vrancx, Peter
    Bacon, Pierre-Luc
    Precup, Doina
    Nowe, Ann
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3173 - 3182
  • [2] Online Learning with Off-Policy Feedback
    Gabbianelli, Germano
    Neu, Gergely
    Papini, Matteo
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 620 - 641
  • [3] Off-policy Learning for Multiple Loggers
    He, Li
    Xia, Long
    Zeng, Wei
    Ma, Zhi-Ming
    Zhao, Yihong
    Yin, Dawei
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 1184 - 1193
  • [4] Exponential Smoothing for Off-Policy Learning
    Aouali, Imad
    Brunel, Victor-Emmanuel
    Rohde, David
    Korba, Anna
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [5] Off-Policy Evaluation via Off-Policy Classification
    Irpan, Alex
    Rao, Kanishka
    Bousmalis, Konstantinos
    Harris, Chris
    Ibarz, Julian
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Learning Routines for Effective Off-Policy Reinforcement Learning
    Cetin, Edoardo
    Celiktutan, Oya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [7] Safe and efficient off-policy reinforcement learning
    Munos, Remi
    Stepleton, Thomas
    Harutyunyan, Anna
    Bellemare, Marc G.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [8] Conditional Importance Sampling for Off-Policy Learning
    Rowland, Mark
    Harutyunyan, Anna
    van Hasselt, Hado
    Borsa, Diana
    Schaul, Tom
    Munos, Remi
    Dabney, Will
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 45 - 54
  • [9] Chaining Value Functions for Off-Policy Learning
    Schmitt, Simon
    Shawe-Taylor, John
    van Hasselt, Hado
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8187 - 8195
  • [10] Off-policy Learning With Eligibility Traces: A Survey
    Geist, Matthieu
    Scherrer, Bruno
    JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 289 - 333