RL for Latent MDPs: Regret Guarantees and a Lower Bound

被引:0
|
作者
Kwon, Jeongyeol [1 ]
Efroni, Yonathan [2 ]
Caramanis, Constantine [1 ]
Mannor, Shie [3 ,4 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Microsoft Res, New York, NY USA
[3] Technion, Haifa, Israel
[4] NVIDIA, Santa Clara, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP). In an LMDP, an MDP is randomly drawn from a set of M possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. We first show that a general instance of LMDPs requires at least Omega((SA)(M)) episodes to even approximate the optimal policy. Then, we consider sufficient assumptions under which learning good policies requires polynomial number of episodes. We show that the key link is a notion of separation between the MDP system dynamics. With sufficient separation, we provide an efficient algorithm with local guarantee, i.e., providing a sublinear regret guarantee when we are given a good initialization. Finally, if we are given standard statistical sufficiency assumptions common in the Predictive State Representation (PSR) literature (e.g., [6]) and a reachability assumption, we show that the need for initialization can be removed.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP
    Levy, Orin
    Mansour, Yishay
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8510 - 8517
  • [42] Regret Analysis for RL using Renewal Bandit Feedback
    Bhatt, Sujay
    Fang, Guanhua
    Li, Ping
    Samorodnitsky, Gennady
    2022 IEEE INFORMATION THEORY WORKSHOP (ITW), 2022, : 137 - 142
  • [43] Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management
    Agrawal S.
    Jia R.
    Operations Research, 2022, 70 (03): : 1646 - 1664
  • [44] Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management
    Agrawal, Shipra
    Jia, Randy
    OPERATIONS RESEARCH, 2022,
  • [45] Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management
    Agrawal, Shipra
    Jia, Randy
    ACM EC '19: PROCEEDINGS OF THE 2019 ACM CONFERENCE ON ECONOMICS AND COMPUTATION, 2019, : 743 - 744
  • [46] PDFA Distillation with Error Bound Guarantees
    Baumgartner, Robert
    Verwer, Sicco
    IMPLEMENTATION AND APPLICATION OF AUTOMATA, CIAA 2024, 2024, 15015 : 51 - 65
  • [47] Discovery and density estimation of latent confounders in Bayesian networks with evidence lower bound
    Chobtham, Kiattikun
    Constantinou, Anthony C.
    INTERNATIONAL CONFERENCE ON PROBABILISTIC GRAPHICAL MODELS, VOL 186, 2022, 186
  • [48] Liquid Welfare Guarantees for No-Regret Learning in Sequential Budgeted Auctions
    Fikioris, Giannis
    Tardos, Eva
    MATHEMATICS OF OPERATIONS RESEARCH, 2024,
  • [49] Efficient processing of k-regret minimization queries with theoretical guarantees
    Zheng, Jiping
    Dong, Qi
    Wang, Xiaoyang
    Zhang, Ying
    Ma, Wei
    Ma, Yuan
    INFORMATION SCIENCES, 2022, 586 : 99 - 118
  • [50] Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
    Tirinzoni, Andrea
    Papini, Matteo
    Touati, Ahmed
    Lazaric, Alessandro
    Pirotta, Matteo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,