A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

被引:0
|
作者
Chen, Fan [1 ]
Zhang, Junyu [2 ]
Wen, Zaiwen [3 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Natl Univ Singapore, Dept Ind Syst Engn & Management, Singapore, Singapore
[3] Peking Univ, Beijing Int Ctr Math Res Ctr Machine Learning Res, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available. By adopting the concept of the single-policy concentrability coefficient C*, we establish an Omega(min{|S||A|,|S|+I}C*/(1-gamma)(3)epsilon(2)) sample complexity lower bound for the offline CMDP problem, where I stands for the number of constraints. By introducing a simple but novel deviation control mechanism, we propose a near-optimal primal-dual learning algorithm called DPDL. This algorithm provably guarantees zero constraint violation and its sample complexity matches the above lower bound except for an (O) over tilde((1 -gamma)(-1)) factor. Comprehensive discussion on how to deal with the unknown constant C* and the potential asynchronous structure on the offline dataset are also included.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method
    Lee, Donghwan
    Kim, Do Wan
    Hu, Jianghai
    IEEE ACCESS, 2022, 10 : 107077 - 107094
  • [2] Near-Optimal Rapid MPC Using Neural Networks: A Primal-Dual Policy Learning Framework
    Zhang, Xiaojing
    Bujarbaruah, Monimoy
    Borrelli, Francesco
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2021, 29 (05) : 2102 - 2114
  • [3] Safe and Near-Optimal Policy Learning for Model Predictive Control using Primal-Dual Neural Networks
    Zhang, Xiaojing
    Bujarbaruah, Monimoy
    Borrelli, Francesco
    2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 354 - 359
  • [4] Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss
    Qiu, Shuang
    Wei, Xiaohan
    Yang, Zhuoran
    Ye, Jieping
    Wang, Zhaoran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [5] The Primal-Dual method for Learning Augmented Algorithms
    Bamas, Etienne
    Maggiori, Andreas
    Svensson, Ola
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [6] Primal-Dual Estimator Learning Method with Feasibility and Near-Optimality Guarantees
    Cao, Wenhan
    Duan, Jingliang
    Li, Shengbo Eben
    Chen, Chen
    Liu, Chang
    Wang, Yu
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 4104 - 4111
  • [7] A projected primal-dual gradient optimal control method for deep reinforcement learning
    Simon Gottschalk
    Michael Burger
    Matthias Gerdts
    Journal of Mathematics in Industry, 10
  • [8] A projected primal-dual gradient optimal control method for deep reinforcement learning
    Gottschalk, Simon
    Burger, Michael
    Gerdts, Matthias
    JOURNAL OF MATHEMATICS IN INDUSTRY, 2020, 10 (01)
  • [9] Optimal reactive dispatch through primal-dual method
    daCosta, GRM
    IEEE TRANSACTIONS ON POWER SYSTEMS, 1997, 12 (02) : 669 - 674
  • [10] Near-optimal tensor methods for minimizing the gradient norm of convex functions and accelerated primal-dual tensor methods
    Dvurechensky, Pavel
    Ostroukhov, Petr
    Gasnikov, Alexander
    Uribe, Cesar A.
    Ivanova, Anastasiya
    OPTIMIZATION METHODS & SOFTWARE, 2024, 39 (05): : 1068 - 1103