Boosted Off-Policy Learning

被引：0

作者：

London, Ben ^{[1
]}

Lu, Levi ^{[1
]}

Sandler, Ted ^{[2
]}

Joachims, Thorsten ^{[1
]}

机构：

[1] Amazon, Seattle, WA 98109 USA

[2] Groundlight AI, Seattle, WA USA

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206 | 2023年 / 206卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose the first boosting algorithm for off-policy learning from logged bandit feedback. Unlike existing boosting methods for supervised learning, our algorithm directly optimizes an estimate of the policy's expected reward. We analyze this algorithm and prove that the excess empirical risk decreases (possibly exponentially fast) with each round of boosting, provided a "weak" learning condition is satisfied by the base learner. We further show how to reduce the base learner to supervised learning, which opens up a broad range of readily available base learners with practical benefits, such as decision trees. Experiments indicate that our algorithm inherits many desirable properties of tree-based boosting algorithms (e.g., robustness to feature scaling and hyperparameter tuning), and that it can outperform off-policy learning with deep neural networks as well as methods that simply regress on the observed rewards.

引用

页数：27

共 50 条

[31] Batch Reinforcement Learning With a Nonparametric Off-Policy Policy Gradient
Tosatto, Samuele
Carvalho, Joao
Peters, Jan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 5996 - 6010
[32] Off-policy and on-policy reinforcement learning with the Tsetlin machine
Gorji, Saeed Rahimi
Granmo, Ole-Christoffer
APPLIED INTELLIGENCE, 2023, 53 (08) : 8596 - 8613
[33] Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
Zhang, Zeyu
Su, Yi
Yuan, Hui
Wu, Yiran
Balasubramanian, Rishab
Wu, Qingyun
Wang, Huazheng
Wang, Mengdi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[34] Off-policy Imitation Learning from Visual Inputs
Cheng, Zhihao
Shen, Li
Tao, Dacheng
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 2937 - 2943
[35] Towards Robust Off-Policy Learning for Runtime Uncertainty
Xu, Da
Ye, Yuting
Ruan, Chuanwei
Yang, Bo
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10101 - 10109
[36] Approximation spaces in off-policy Monte Carlo learning
Peters, James F.
Henry, Christopher
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2007, 20 (05) : 667 - 675
[37] Scaling Life-long Off-policy Learning
White, Adam
Modayil, Joseph
Sutton, Richard S.
2012 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL), 2012,
[38] Counterfactual experience augmented off-policy reinforcement learning
Lee, Sunbowen
Gong, Yicheng
Deng, Chao
NEUROCOMPUTING, 2025, 637
[39] Off-policy Learning over Heterogeneous Information for Recommendation
Wang, Xiangmeng
Li, Qian
Yu, Dianer
Xu, Guandong
PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 2348 - 2359
[40] Flexible Data Augmentation in Off-Policy Reinforcement Learning
Rak, Alexandra
Skrynnik, Alexey
Panov, Aleksandr I.
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2021), PT I, 2021, 12854 : 224 - 235

← 1 2 3 4 5 →