Lower Bounds for Learning in Revealing POMDPs

被引:0
|
作者
Chen, Fan [1 ]
Wang, Huan [2 ]
Xiong, Caiming [2 ]
Mei, Song [3 ]
Bai, Yu [2 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Salesforce AI Res, San Francisco, CA 94105 USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the fundamental limits of reinforcement learning (RL) in the challenging partially observable setting. While it is well-established that learning in Partially Observable Markov Decision Processes (POMDPs) requires exponentially many samples in the worst case, a surge of recent work shows that polynomial sample complexities are achievable under the revealing condition-A natural condition that requires the observables to reveal some information about the unobserved latent states. However, the fundamental limits for learning in revealing POMDPs are much less understood, with existing lower bounds being rather preliminary and having substantial gaps from the current best upper bounds. We establish strong PAC and regret lower bounds for learning in revealing POMDPs. Our lower bounds scale polynomially in all relevant problem parameters in a multiplicative fashion, and achieve significantly smaller gaps against the current best upper bounds, providing a solid starting point for future studies. In particular, for multi-step revealing POMDPs, we show that (1) the latent state-space dependence is at least Omega(S-1.5) in the PAC sample complexity, which is notably harder than the (Theta) over tilde (S) scaling for fully-observable MDPs; (2) Any polynomial sublinear regret is at least Omega(T-2/3), suggesting its fundamental difference from the single-step case where (O) over tilde(root T) regret is achievable. Technically, our hard instance construction adapts techniques in distribution testing, which is new to the RL literature and may be of independent interest. We also complement our results with new sharp regret upper bounds for strongly B-stable PSRs, which include single-step revealing POMDPs as a special case.
引用
收藏
页数:58
相关论文
共 50 条
  • [1] Lower Bounds for Active Automata Learning
    Kruger, Loes
    Garhewal, Bharat
    Vaandrager, Frits
    INTERNATIONAL CONFERENCE ON GRAMMATICAL INFERENCE, VOL 217, 2023, 217 : 157 - 180
  • [2] Strong minimax lower bounds for learning
    Antos, A
    Lugosi, G
    MACHINE LEARNING, 1998, 30 (01) : 31 - 56
  • [3] Strong Minimax Lower Bounds for Learning
    András Antos
    Gábor Lugosi
    Machine Learning, 1998, 30 : 31 - 56
  • [4] On Lower Bounds for Statistical Learning Theory
    Loh, Po-Ling
    ENTROPY, 2017, 19 (11)
  • [5] LEARNING ALGORITHMS FROMCIRCUIT LOWER BOUNDS
    Pich, Jan
    COMPUTATIONAL COMPLEXITY, 2025, 34 (01)
  • [6] Order-Revealing Encryption: New Constructions, Applications, and Lower Bounds
    Lewi, Kevin
    Wu, David J.
    CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, : 1167 - 1178
  • [7] New lower bounds for statistical query learning
    Yang, K
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2005, 70 (04) : 485 - 509
  • [8] Unconditional lower bounds for learning intersections of halfspaces
    Adam R. Klivans
    Alexander A. Sherstov
    Machine Learning, 2007, 69 : 97 - 114
  • [9] Lower Bounds on the Rate of Learning in Social Networks
    Lobel, Ilan
    Acemoglu, Daron
    Dahleh, Munther
    Ozdaglar, Asuman
    2009 AMERICAN CONTROL CONFERENCE, VOLS 1-9, 2009, : 2825 - +
  • [10] Unconditional lower bounds for learning intersections of halfspaces
    Klivans, Adam R.
    Sherstov, Alexander A.
    MACHINE LEARNING, 2007, 69 (2-3) : 97 - 114