A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms

被引:0
|
作者
Chen Z. [1 ]
机构
[1] Georgia Tech ISyE, United States
来源
Performance Evaluation Review | 2023年 / 50卷 / 03期
关键词
Approximation algorithms - Dynamic programming - Economic and social effects - Learning algorithms - Lyapunov functions - Markov processes - Sampling - Stochastic systems;
D O I
10.1145/3579342.3579346
中图分类号
学科分类号
摘要
Reinforcement learning (RL) is a paradigm where an agent learns to accomplish tasks by interacting with the environment, similar to how humans learn. RL is therefore viewed as a promising approach to achieve artificial intelligence, as evidenced by the remarkable empirical successes. However, many RL algorithms are theoretically not well-understood, especially in the setting where function approximation and off-policy sampling are employed. My thesis [1] aims at developing thorough theoretical understanding to the performance of various RL algorithms through finite-sample analysis. Since most of the RL algorithms are essentially stochastic approximation (SA) algorithms for solving variants of the Bellman equation, the first part of thesis is dedicated to the analysis of general SA involving a contraction operator, and under Markovian noise. We develop a Lyapunov approach where we construct a novel Lyapunov function called the generaled Moreau envelope. The results on SA enable us to establish finite-sample bounds of various RL algorithms in the tabular setting (cf. Part II of the thesis) and when using function approximation (cf. Part III of the thesis), which in turn provide theoretical insights to several important problems in the RL community, such as the efficiency of bootstrapping, the bias-variance trade-off in off-policy learning, and the stability of off-policy control. The main body of this document provides an overview of the contributions of my thesis. © 2023 Copyright is held by the owner/author(s).
引用
收藏
页码:12 / 15
页数:3
相关论文
共 50 条
  • [31] Finite-sample Analysis of Interpolating Linear Classifiers in the Overparameterized Regime
    Chatterji, Niladri S.
    Long, Philip M.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [32] Finite-Sample Analysis of Deep CCA-Based Unsupervised Post-Nonlinear Multimodal Learning
    Lyu, Qi
    Fu, Xiao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) : 9568 - 9574
  • [33] Reinforcement Learning for Optimal Tracking and Regulation: A Unified Framework
    Lewis, F. L.
    Modares, H.
    Kiumarsi, B.
    2015 AMERICAN CONTROL CONFERENCE (ACC), 2015, : 5082 - 5082
  • [34] A unified framework to control estimation error in reinforcement learning
    Zhang, Yujia
    Li, Lin
    Wei, Wei
    Lv, Yunpeng
    Liang, Jiye
    NEURAL NETWORKS, 2024, 178
  • [35] Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators
    Chen, Zaiwei
    Maguluri, Siva Theja
    Shakkottai, Sanjay
    Shanmugam, Karthikeyan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [36] Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods
    Guo, Xingang
    Hu, Bin
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 3317 - 3322
  • [37] Finite-sample Guarantees for Nash Q-learning with Linear Function Approximation
    Cisneros-Velarde, Pedro
    Koyejo, Sanmi
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 424 - 432
  • [38] A finite-sample analysis of multi-step temporal difference estimates
    Duan, Yaqi
    Wainwright, Martin J.
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [39] Study of sample efficiency improvements for reinforcement learning algorithms
    Tianyue Cao
    2020 9TH IEEE INTEGRATED STEM EDUCATION CONFERENCE (ISEC 2020), 2020,
  • [40] BOX-COUNTING CLUSTERING ANALYSIS - CORRECTIONS FOR FINITE-SAMPLE EFFECTS
    BORGANI, S
    MURANTE, G
    PHYSICAL REVIEW E, 1994, 49 (06): : 4907 - 4912