A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

被引:0
|
作者
Chen, Fan [1 ]
Zhang, Junyu [2 ]
Wen, Zaiwen [3 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Natl Univ Singapore, Dept Ind Syst Engn & Management, Singapore, Singapore
[3] Peking Univ, Beijing Int Ctr Math Res Ctr Machine Learning Res, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available. By adopting the concept of the single-policy concentrability coefficient C*, we establish an Omega(min{|S||A|,|S|+I}C*/(1-gamma)(3)epsilon(2)) sample complexity lower bound for the offline CMDP problem, where I stands for the number of constraints. By introducing a simple but novel deviation control mechanism, we propose a near-optimal primal-dual learning algorithm called DPDL. This algorithm provably guarantees zero constraint violation and its sample complexity matches the above lower bound except for an (O) over tilde((1 -gamma)(-1)) factor. Comprehensive discussion on how to deal with the unknown constant C* and the potential asynchronous structure on the offline dataset are also included.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] An Efficient Primal-Dual Method for the Obstacle Problem
    Dominique Zosso
    Braxton Osting
    Mandy(Mengqi) Xia
    Stanley J. Osher
    Journal of Scientific Computing, 2017, 73 : 416 - 437
  • [42] Reoptimization with the primal-dual interior point method
    Gondzio, J
    Grothey, A
    SIAM JOURNAL ON OPTIMIZATION, 2003, 13 (03) : 842 - 864
  • [43] ON THE IMPLEMENTATION OF A PRIMAL-DUAL INTERIOR POINT METHOD
    Mehrotra, Sanjay
    SIAM JOURNAL ON OPTIMIZATION, 1992, 2 (04) : 575 - 601
  • [44] Primal-dual method for the coupled variational model
    Hao, Yan
    Xu, Jianlou
    Bai, Jian
    COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (03) : 808 - 818
  • [45] Learning with Options that Terminate Off-Policy
    Harutyunyan, Anna
    Vrancx, Peter
    Bacon, Pierre-Luc
    Precup, Doina
    Nowe, Ann
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3173 - 3182
  • [46] Primal-Dual ε-Subgradient Method for Distributed Optimization
    ZHU Kui
    TANG Yutao
    Journal of Systems Science & Complexity, 2023, 36 (02) : 577 - 590
  • [47] An Efficient Primal-Dual Method for the Obstacle Problem
    Zosso, Dominique
    Osting, Braxton
    Xia, Mandy
    Osher, Stanley J.
    JOURNAL OF SCIENTIFIC COMPUTING, 2017, 73 (01) : 416 - 437
  • [48] Dynamic algorithms via the primal-dual method
    Bhattacharya, Sayan
    Henzinger, Monika
    Italiano, Giuseppe
    INFORMATION AND COMPUTATION, 2018, 261 : 219 - 239
  • [49] Online Learning with Off-Policy Feedback
    Gabbianelli, Germano
    Neu, Gergely
    Papini, Matteo
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 620 - 641
  • [50] Off-policy Learning for Multiple Loggers
    He, Li
    Xia, Long
    Zeng, Wei
    Ma, Zhi-Ming
    Zhao, Yihong
    Yin, Dawei
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 1184 - 1193