A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms

被引:0
|
作者
Chen Z. [1 ]
机构
[1] Georgia Tech ISyE, United States
来源
Performance Evaluation Review | 2023年 / 50卷 / 03期
关键词
Approximation algorithms - Dynamic programming - Economic and social effects - Learning algorithms - Lyapunov functions - Markov processes - Sampling - Stochastic systems;
D O I
10.1145/3579342.3579346
中图分类号
学科分类号
摘要
Reinforcement learning (RL) is a paradigm where an agent learns to accomplish tasks by interacting with the environment, similar to how humans learn. RL is therefore viewed as a promising approach to achieve artificial intelligence, as evidenced by the remarkable empirical successes. However, many RL algorithms are theoretically not well-understood, especially in the setting where function approximation and off-policy sampling are employed. My thesis [1] aims at developing thorough theoretical understanding to the performance of various RL algorithms through finite-sample analysis. Since most of the RL algorithms are essentially stochastic approximation (SA) algorithms for solving variants of the Bellman equation, the first part of thesis is dedicated to the analysis of general SA involving a contraction operator, and under Markovian noise. We develop a Lyapunov approach where we construct a novel Lyapunov function called the generaled Moreau envelope. The results on SA enable us to establish finite-sample bounds of various RL algorithms in the tabular setting (cf. Part II of the thesis) and when using function approximation (cf. Part III of the thesis), which in turn provide theoretical insights to several important problems in the RL community, such as the efficiency of bootstrapping, the bias-variance trade-off in off-policy learning, and the stability of off-policy control. The main body of this document provides an overview of the contributions of my thesis. © 2023 Copyright is held by the owner/author(s).
引用
收藏
页码:12 / 15
页数:3
相关论文
共 50 条
  • [1] Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning
    Chen, Zaiwei
    Zhang, Sheng
    Doan, Thinh T.
    Clarke, John-Paul
    Maguluri, Siva Theja
    AUTOMATICA, 2022, 146
  • [2] Towards finite-sample convergence of direct reinforcement learning
    Lim, SH
    DeJong, G
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 230 - 241
  • [3] Finite-Sample Analysis for Decentralized Batch Multiagent Reinforcement Learning With Networked Agents
    Zhang, Kaiqing
    Yang, Zhuoran
    Liu, Han
    Zhang, Tong
    Basar, Tamer
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (12) : 5925 - 5940
  • [4] Finite-Sample Analysis of Proximal Gradient TD Algorithms
    Liu, Bo
    Liu, Ji
    Ghavamzadeh, Mohammad
    Mahadevan, Sridhar
    Petrik, Marek
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2015, : 504 - 513
  • [5] Finite-sample convergence rates for Q-learning and indirect algorithms
    Kearns, M
    Singh, S
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 11, 1999, 11 : 996 - 1002
  • [6] Learning Topic Models: Identifiability and Finite-Sample Analysis
    Chen, Yinyin
    He, Shishuang
    Yang, Yun
    Liang, Feng
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (544) : 2860 - 2875
  • [7] On the finite-sample analysis of Θ-estimators
    She, Yiyuan
    ELECTRONIC JOURNAL OF STATISTICS, 2016, 10 (02): : 1874 - 1895
  • [8] Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning
    Zhou, Zhengqing
    Zhou, Zhengyuan
    Bai, Qinxun
    Qiu, Linhai
    Blanchet, Jose
    Glynn, Peter
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [9] A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation
    Chen, Zaiwei
    Maguluri, Siva T.
    Shakkottai, Sanjay
    Shanmugam, Karthikeyan
    OPERATIONS RESEARCH, 2024, 72 (04) : 1352 - 1367
  • [10] Finite-Sample Analysis For Decentralized Cooperative Multi-Agent Reinforcement Learning From Batch Data
    Zhang, Kaiqing
    Yang, Zhuoran
    Liu, Han
    Zhang, Tong
    Basar, Tamer
    IFAC PAPERSONLINE, 2020, 53 (02): : 1049 - 1056