Finite Sample Analysis of Average-Reward TD Learning and Q-Learning

被引:0
|
作者
Zhang, Sheng [1 ]
Zhang, Zhe [1 ]
Maguluri, Siva Theja [1 ]
机构
[1] Georgia Inst Technol, H Milton Stewart Sch Ind & Syst Engn, Atlanta, GA 30332 USA
关键词
REINFORCEMENT; ALGORITHMS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The focus of this paper is on sample complexity guarantees of average-reward reinforcement learning algorithms, which are known to be more challenging to study than their discounted-reward counterparts. To the best of our knowledge, we provide the first known finite sample guarantees using both constant and diminishing step sizes of (i) average-reward TD(lambda) with linear function approximation for policy evaluation and (ii) average-reward Q-learning in the tabular setting to find the optimal policy. A major challenge is that since the value functions are agnostic to an additive constant, the corresponding Bellman operators are no longer contraction mappings under any norm. We obtain the results for TD(lambda) by working in an appropriately defined subspace that ensures uniqueness of the solution. For Q-learning, we exploit the span seminorm contractive property of the Bellman operator, and construct a novel Lyapunov function obtained by infimal convolution of a generalized Moreau envelope and the indicator function of a set.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Relative Q-learning for Average-Reward Markov Decision Processes with Continuous States
    Yang X.
    Hu J.
    Hu J.
    [J]. IEEE Transactions on Automatic Control, 2024, 69 (10) : 1 - 14
  • [2] Feasible Q-Learning for Average Reward Reinforcement Learning
    Jin, Ying
    Blanchet, Jose
    Gummadi, Ramki
    Zhou, Zhengyuan
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [3] Robust Average-Reward Reinforcement Learning
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 719 - 803
  • [4] Robust Average-Reward Reinforcement Learning
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    [J]. Journal of Artificial Intelligence Research, 2024, 80 : 719 - 803
  • [5] Average-Reward Learning and Planning with Options
    Wan, Yi
    Naik, Abhishek
    Sutton, Richard S.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [6] Learning and Planning in Average-Reward Markov Decision Processes
    Wan, Yi
    Naik, Abhishek
    Sutton, Richard S.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7665 - 7676
  • [7] Average-Reward Reinforcement Learning with Trust Region Methods
    Ma, Xiaoteng
    Tang, Xiaohang
    Xia, Li
    Yang, Jun
    Zhao, Qianchuan
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2797 - 2803
  • [8] Tuning Local Search by Average-Reward Reinforcement Learning
    Prestwich, Steven
    [J]. LEARNING AND INTELLIGENT OPTIMIZATION, 2008, 5313 : 192 - 205
  • [9] Whittle index based Q-learning for restless bandits with average reward
    Avrachenkov, Konstantin E.
    Borkar, Vivek S.
    [J]. AUTOMATICA, 2022, 139
  • [10] Full Gradient Deep Reinforcement Learning for Average-Reward Criterion
    Pagare, Tejas
    Borkar, Vivek
    Avrachenkov, Konstantin
    [J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211