A Reduction from Reinforcement Learning to No-Regret Online Learning

被引:0
|
作者
Cheng, Ching-An [1 ]
des Combes, Remi Tachet [2 ]
Boots, Byron [3 ]
Gordon, Geoff [2 ]
机构
[1] Georgia Tech, Atlanta, GA 30332 USA
[2] Microsoft Res, Redmond, WA USA
[3] Univ Washington, Seattle, WA 98195 USA
关键词
CONVERGENCE; BIFUNCTIONS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which any online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function approximation. The first part admits a standard online-learning analysis, and the second part can be quantified independently of the learning algorithm. Therefore, the proposed reduction can be used as a tool to systematically design new RL algorithms. We demonstrate this idea by devising a simple RL algorithm based on mirror descent and the generative-model oracle. For any gamma-discounted tabular RL problem, with probability at least 1 - delta, it learns an epsilon-optimal policy using at most O(vertical bar S parallel to A vertical bar log(1/delta)/(1-gamma)(4)epsilon(2)) samples. Furthermore, this algorithm admits a direct extension to linearly parameterized function approximators for large-scale applications, with computation and sample complexities independent of vertical bar S vertical bar,vertical bar A vertical bar, though at the cost of potential approximation bias.
引用
收藏
页码:3514 / 3523
页数:10
相关论文
共 50 条
  • [41] Liquid Welfare Guarantees for No-Regret Learning in Sequential Budgeted Auctions
    Fikioris, Giannis
    Tardos, Eva
    MATHEMATICS OF OPERATIONS RESEARCH, 2024,
  • [42] Distributed No-Regret Learning for Stochastic Aggregative Games over Networks
    Lei, Jinlong
    Yi, Peng
    Li, Li
    2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 7512 - 7519
  • [43] No-regret algorithms in on-line learning, games and convex optimization
    Sorin, Sylvain
    MATHEMATICAL PROGRAMMING, 2024, 203 (1-2) : 645 - 686
  • [44] No-Regret Learning in Bilateral Trade via Global Budget Balance
    Bernasconi, Martino
    Castiglioni, Matteo
    Celli, Andrea
    Fusco, Federico
    PROCEEDINGS OF THE 56TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, STOC 2024, 2024, : 247 - 258
  • [45] On the Convergence of No-Regret Learning Dynamics in Time-Varying Games
    Anagnostides, Ioannis
    Panageas, Ioannis
    Farina, Gabriele
    Sandholm, Tuomas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] No-Regret Distributed Learning in Subnetwork Zero-Sum Games
    Huang, Shijie
    Lei, Jinlong
    Hong, Yiguang
    Shanbhag, Uday V.
    Chen, Jie
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (10) : 6620 - 6635
  • [47] No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium
    Celli, Andrea
    Marchesi, Alberto
    Farina, Gabriele
    Gatti, Nicola
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [48] Online Learning with Transductive Regret
    Mohri, Mehryar
    Yang, Scott
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [49] Distributed No-Regret Learning in Aggregative Games With Residual Bandit Feedback
    Liu, Wenting
    Yi, Peng
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2024, 11 (04): : 1734 - 1745
  • [50] No-regret algorithms in on-line learning, games and convex optimization
    Sylvain Sorin
    Mathematical Programming, 2024, 203 : 645 - 686