A Reduction from Reinforcement Learning to No-Regret Online Learning

被引:0
|
作者
Cheng, Ching-An [1 ]
des Combes, Remi Tachet [2 ]
Boots, Byron [3 ]
Gordon, Geoff [2 ]
机构
[1] Georgia Tech, Atlanta, GA 30332 USA
[2] Microsoft Res, Redmond, WA USA
[3] Univ Washington, Seattle, WA 98195 USA
关键词
CONVERGENCE; BIFUNCTIONS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which any online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function approximation. The first part admits a standard online-learning analysis, and the second part can be quantified independently of the learning algorithm. Therefore, the proposed reduction can be used as a tool to systematically design new RL algorithms. We demonstrate this idea by devising a simple RL algorithm based on mirror descent and the generative-model oracle. For any gamma-discounted tabular RL problem, with probability at least 1 - delta, it learns an epsilon-optimal policy using at most O(vertical bar S parallel to A vertical bar log(1/delta)/(1-gamma)(4)epsilon(2)) samples. Furthermore, this algorithm admits a direct extension to linearly parameterized function approximators for large-scale applications, with computation and sample complexities independent of vertical bar S vertical bar,vertical bar A vertical bar, though at the cost of potential approximation bias.
引用
收藏
页码:3514 / 3523
页数:10
相关论文
共 50 条
  • [1] No-regret Reinforcement Learning
    Gopalan, Aditya
    2019 FIFTH INDIAN CONTROL CONFERENCE (ICC), 2019, : 16 - 16
  • [2] No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions
    Jin, Tiancheng
    Liu, Junyan
    Rouyer, Chloe
    Chang, William
    Wei, Chen-Yu
    Luo, Haipeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] No-regret Exploration in Contextual Reinforcement Learning
    Modi, Aditya
    Tewari, Ambuj
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 829 - 838
  • [4] No-Regret and Incentive-Compatible Online Learning
    Freeman, Rupert
    Pennock, David M.
    Podimata, Chara
    Vaughan, Jennifer Wortman
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [5] No-regret Online Learning over Riemannian Manifolds
    Wang, Xi
    Tu, Zhipeng
    Hong, Yiguang
    Wu, Yingyi
    Shi, Guodong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [6] No-Regret and Incentive-Compatible Online Learning
    Freeman, Rupert
    Pennock, David M.
    Podimata, Chara
    Vaughan, Jennifer Wortman
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [7] No-Regret Reinforcement Learning with Heavy-Tailed Rewards
    Zhuang, Vincent
    Sui, Yanan
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [8] Online Optimisation for Online Learning and Control - From No-Regret to Generalised Error Convergence
    Calliess, J.
    2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), 2019, : 2480 - 2485
  • [9] Constrained no-regret learning
    Du, Ye
    Lehrer, Ehud
    JOURNAL OF MATHEMATICAL ECONOMICS, 2020, 88 : 16 - 24
  • [10] NO-REGRET NON-CONVEX ONLINE META-LEARNING
    Zhuang, Zhenxun
    Wang, Yunlong
    Yu, Kezi
    Lu, Songtao
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3942 - 3946