A Reduction from Reinforcement Learning to No-Regret Online Learning

被引:0
|
作者
Cheng, Ching-An [1 ]
des Combes, Remi Tachet [2 ]
Boots, Byron [3 ]
Gordon, Geoff [2 ]
机构
[1] Georgia Tech, Atlanta, GA 30332 USA
[2] Microsoft Res, Redmond, WA USA
[3] Univ Washington, Seattle, WA 98195 USA
关键词
CONVERGENCE; BIFUNCTIONS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which any online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function approximation. The first part admits a standard online-learning analysis, and the second part can be quantified independently of the learning algorithm. Therefore, the proposed reduction can be used as a tool to systematically design new RL algorithms. We demonstrate this idea by devising a simple RL algorithm based on mirror descent and the generative-model oracle. For any gamma-discounted tabular RL problem, with probability at least 1 - delta, it learns an epsilon-optimal policy using at most O(vertical bar S parallel to A vertical bar log(1/delta)/(1-gamma)(4)epsilon(2)) samples. Furthermore, this algorithm admits a direct extension to linearly parameterized function approximators for large-scale applications, with computation and sample complexities independent of vertical bar S vertical bar,vertical bar A vertical bar, though at the cost of potential approximation bias.
引用
收藏
页码:3514 / 3523
页数:10
相关论文
共 50 条
  • [21] No-Regret Learning from Partially Observed Data in Repeated Auctions
    Karaca, Orcun
    Sessa, Pier Giuseppe
    Leidi, Anna
    Kamgarpour, Maryam
    IFAC PAPERSONLINE, 2020, 53 (02): : 14 - 19
  • [22] Doubly Optimal No-Regret Learning in Monotone Games
    Cai, Yang
    Zheng, Weiqiang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [23] No-Regret Learning in Partially-Informed Auctions
    Guo, Wenshuo
    Jordan, Michael I.
    Vitercik, Ellen
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [24] No-Regret Learning for Coalitional Model Predictive Control
    Chanfreut, P.
    Maestre, J. M.
    Zhu, Q.
    Camacho, E. F.
    IFAC PAPERSONLINE, 2020, 53 (02): : 3439 - 3444
  • [25] No-Regret Learning and Equilibrium Computation in Quantum Games
    Lin, Wayne
    Piliouras, Georgios
    Sim, Ryann
    Varvitsiotis, Antonios
    QUANTUM, 2024, 8
  • [26] No-Regret Learning in Unknown Games with Correlated Payoffs
    Sessa, Pier Giuseppe
    Bogunovic, Ilija
    Kamgarpour, Maryam
    Krause, Andreas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [27] No-Regret Shannon Entropy Regularized Neural Contextual Bandit Online Learning for Robotic Grasping
    Lee, Kyungjae
    Choy, Jaegu
    Choi, Yunho
    Kee, Hogun
    Oh, Songhwai
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 9620 - 9625
  • [28] Sampling Equilibria: Fast No-Regret Learning in Structured Games
    Beaglehole, Daniel
    Hopkins, Max
    Kane, Daniel
    Liu, Sihan
    Lovett, Shachar
    PROCEEDINGS OF THE 2023 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2023, : 3817 - 3855
  • [29] Robustness enhancement of complex networks via No-Regret learning
    Sohn, Insoo
    ICT EXPRESS, 2019, 5 (03): : 163 - 166
  • [30] No-Regret Learning in Collaborative Spectrum Sensing with Malicious Nodes
    Zhu, Quanyan
    Han, Zhu
    Basar, Tamer
    2010 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2010,