A Reduction from Reinforcement Learning to No-Regret Online Learning

被引:0
|
作者
Cheng, Ching-An [1 ]
des Combes, Remi Tachet [2 ]
Boots, Byron [3 ]
Gordon, Geoff [2 ]
机构
[1] Georgia Tech, Atlanta, GA 30332 USA
[2] Microsoft Res, Redmond, WA USA
[3] Univ Washington, Seattle, WA 98195 USA
关键词
CONVERGENCE; BIFUNCTIONS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which any online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function approximation. The first part admits a standard online-learning analysis, and the second part can be quantified independently of the learning algorithm. Therefore, the proposed reduction can be used as a tool to systematically design new RL algorithms. We demonstrate this idea by devising a simple RL algorithm based on mirror descent and the generative-model oracle. For any gamma-discounted tabular RL problem, with probability at least 1 - delta, it learns an epsilon-optimal policy using at most O(vertical bar S parallel to A vertical bar log(1/delta)/(1-gamma)(4)epsilon(2)) samples. Furthermore, this algorithm admits a direct extension to linearly parameterized function approximators for large-scale applications, with computation and sample complexities independent of vertical bar S vertical bar,vertical bar A vertical bar, though at the cost of potential approximation bias.
引用
收藏
页码:3514 / 3523
页数:10
相关论文
共 50 条
  • [31] No-Regret Learning with Unbounded Losses: The Case of Logarithmic Pooling
    Neyman, Eric
    Roughgarden, Tim
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [32] On the Impossibility of Convergence of Mixed Strategies with Optimal No-Regret Learning
    Muthukumar, Vidya
    Phade, Soham
    Sahai, Anant
    MATHEMATICS OF OPERATIONS RESEARCH, 2024,
  • [33] No-regret learning for repeated concave games with lossy bandits
    Liu, Wenting
    Lei, Jinlong
    Yi, Peng
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 936 - 941
  • [34] Near-Optimal No-Regret Learning in General Games
    Daskalakis, Constantinos
    Fishelson, Maxwell
    Golowich, Noah
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [35] Generative Hybrid Representations for Activity Forecasting with No-Regret Learning
    Guan, Jiaqi
    Yuan, Ye
    Kitani, Kris M.
    Rhinehart, Nicholas
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 170 - 179
  • [36] No-Regret Learning and Mixed Nash Equilibria: They Do Not Mix
    Flokas, Lampros
    Vlatakis-Gkaragkounis, Emmanouil V.
    Lianeas, Thanasis
    Mertikopoulos, Panayotis
    Piliouras, Georgios
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [37] Tighter Robust Upper Bounds for Options via No-Regret Learning
    Xue, Shan
    Du, Ye
    Xu, Liang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 5348 - 5356
  • [38] Optimal No-Regret Learning in Repeated First-Price Auctions
    Han, Yanjun
    Weissman, Tsachy
    Zhou, Zhengyuan
    OPERATIONS RESEARCH, 2025, 73 (01)
  • [39] No-Regret Online Prediction with Strategic Experts
    Sadeghi, Omid
    Fazel, Maryam
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [40] No-Regret Learning for Stackelberg Equilibrium Computation in Newsvendor Pricing Games
    Liu, Larkin
    Rong, Yuming
    ALGORITHMIC DECISION THEORY, ADT 2024, 2025, 15248 : 297 - 297