Finite Sample Analysis of Average-Reward TD Learning and Q-Learning

被引：0

作者：

Zhang, Sheng ^{[1
]}

Zhang, Zhe ^{[1
]}

Maguluri, Siva Theja ^{[1
]}

机构：

[1] Georgia Inst Technol, H Milton Stewart Sch Ind & Syst Engn, Atlanta, GA 30332 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

REINFORCEMENT; ALGORITHMS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The focus of this paper is on sample complexity guarantees of average-reward reinforcement learning algorithms, which are known to be more challenging to study than their discounted-reward counterparts. To the best of our knowledge, we provide the first known finite sample guarantees using both constant and diminishing step sizes of (i) average-reward TD(lambda) with linear function approximation for policy evaluation and (ii) average-reward Q-learning in the tabular setting to find the optimal policy. A major challenge is that since the value functions are agnostic to an additive constant, the corresponding Bellman operators are no longer contraction mappings under any norm. We obtain the results for TD(lambda) by working in an appropriately defined subspace that ensures uniqueness of the solution. For Q-learning, we exploit the span seminorm contractive property of the Bellman operator, and construct a novel Lyapunov function obtained by infimal convolution of a generalized Moreau envelope and the indicator function of a set.

引用

页数：13

共 50 条

[1] Relative Q-learning for Average-Reward Markov Decision Processes with Continuous States
Yang X.
Hu J.
Hu J.
[J]. IEEE Transactions on Automatic Control, 2024, 69 (10) : 1 - 14
[2] Feasible Q-Learning for Average Reward Reinforcement Learning
Jin, Ying
Blanchet, Jose
Gummadi, Ramki
Zhou, Zhengyuan
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[3] Robust Average-Reward Reinforcement Learning
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 719 - 803
[4] Robust Average-Reward Reinforcement Learning
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
[J]. Journal of Artificial Intelligence Research, 2024, 80 : 719 - 803
[5] Average-Reward Learning and Planning with Options
Wan, Yi
Naik, Abhishek
Sutton, Richard S.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[6] Learning and Planning in Average-Reward Markov Decision Processes
Wan, Yi
Naik, Abhishek
Sutton, Richard S.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7665 - 7676
[7] Average-Reward Reinforcement Learning with Trust Region Methods
Ma, Xiaoteng
Tang, Xiaohang
Xia, Li
Yang, Jun
Zhao, Qianchuan
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2797 - 2803
[8] Tuning Local Search by Average-Reward Reinforcement Learning
Prestwich, Steven
[J]. LEARNING AND INTELLIGENT OPTIMIZATION, 2008, 5313 : 192 - 205
[9] Whittle index based Q-learning for restless bandits with average reward
Avrachenkov, Konstantin E.
Borkar, Vivek S.
[J]. AUTOMATICA, 2022, 139
[10] Full Gradient Deep Reinforcement Learning for Average-Reward Criterion
Pagare, Tejas
Borkar, Vivek
Avrachenkov, Konstantin
[J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211

← 1 2 3 4 5 →