A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

被引:0
|
作者
Chen, Fan [1 ]
Zhang, Junyu [2 ]
Wen, Zaiwen [3 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Natl Univ Singapore, Dept Ind Syst Engn & Management, Singapore, Singapore
[3] Peking Univ, Beijing Int Ctr Math Res Ctr Machine Learning Res, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available. By adopting the concept of the single-policy concentrability coefficient C*, we establish an Omega(min{|S||A|,|S|+I}C*/(1-gamma)(3)epsilon(2)) sample complexity lower bound for the offline CMDP problem, where I stands for the number of constraints. By introducing a simple but novel deviation control mechanism, we propose a near-optimal primal-dual learning algorithm called DPDL. This algorithm provably guarantees zero constraint violation and its sample complexity matches the above lower bound except for an (O) over tilde((1 -gamma)(-1)) factor. Comprehensive discussion on how to deal with the unknown constant C* and the potential asynchronous structure on the offline dataset are also included.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Obtaining a feasible geometric programming primal solution, given a near-optimal dual solution
    Bricker, Dennis L.
    Choi, Jae Chul
    Engineering Optimization, 23 (04):
  • [32] A Primal-Dual Algorithm for Hybrid Federated Learning
    Overman, Tom
    Blum, Garrett
    Klabjan, Diego
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14482 - 14489
  • [33] A primal-dual perspective of online learning algorithms
    Shalev-Shwartz, Shai
    Singer, Yoram
    MACHINE LEARNING, 2007, 69 (2-3) : 115 - 142
  • [34] A Primal-Dual Formulation for Deep Learning with Constraints
    Nandwani, Yatin
    Pathak, Abhishek
    Mausam
    Singla, Parag
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [35] A primal-dual perspective of online learning algorithms
    Shai Shalev-Shwartz
    Yoram Singer
    Machine Learning, 2007, 69 : 115 - 142
  • [36] Primal-dual strategy for constrained optimal control problems
    Bergounioux, M
    Ito, K
    Kunisch, K
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1999, 37 (04) : 1176 - 1194
  • [37] AN INCREMENTAL PRIMAL-DUAL METHOD FOR GENERALIZED NETWORKS
    CURET, ND
    COMPUTERS & OPERATIONS RESEARCH, 1994, 21 (10) : 1051 - 1059
  • [38] Primal-Dual ε-Subgradient Method for Distributed Optimization
    Zhu, Kui
    Tang, Yutao
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2023, 36 (02) : 577 - 590
  • [39] On a Primal-Dual Interior Point Filter Method
    Costa, M. Fernanda P.
    Fernandes, Edite M. G. P.
    NUMERICAL ANALYSIS AND APPLIED MATHEMATICS ICNAAM 2011: INTERNATIONAL CONFERENCE ON NUMERICAL ANALYSIS AND APPLIED MATHEMATICS, VOLS A-C, 2011, 1389
  • [40] Primal-Dual ε-Subgradient Method for Distributed Optimization
    Kui Zhu
    Yutao Tang
    Journal of Systems Science and Complexity, 2023, 36 : 577 - 590