Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

被引:0
|
作者
Tarbouriech, Jean [1 ,2 ]
Pirotta, Matteo [1 ]
Valko, Michal [3 ]
Lazaric, Alessandro [1 ]
机构
[1] Facebook AI Res Paris, Paris, France
[2] Inria Lille, Lille, France
[3] DeepMind Paris, Paris, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of epsilon-optimal goal-conditioned policies attaining all states that are incrementally reachable within L steps (in expectation) from a reference state s(0). In this paper, we introduce a novel model-based approach that interleaves discovering new states from s(0) and improving the accuracy of a model estimate that is used to compute goal-conditioned policies to reach newly discovered states. The resulting algorithm, DisCo, achieves a sample complexity scaling as (O) over tilde ((LSL+epsilon)-S-5 Gamma(L+epsilon) A epsilon(-2)), where A is the number of actions, SL+epsilon is the number of states that are incrementally reachable from s(0) in L + epsilon steps, and Gamma(L+epsilon) is the branching factor of the dynamics over such states. This improves over the algorithm proposed in [1] in both epsilon and L at the cost of an extra Gamma(L+epsilon) factor, which is small in most environments of interest. Furthermore, DisCo is the first algorithm that can return an epsilon/c(min)-optimal policy for any cost-sensitive shortest-path problem defined on the L-reachable states with minimum cost c(min). Finally, we report preliminary empirical results confirming our theoretical findings.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms
    Gajane, Pratik
    Auer, Peter
    Ortner, Ronald
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 3714 - 3722
  • [2] Near-Optimal Sample Complexity Bounds for Constrained MDPs
    Vaswani, Sharan
    Yang, Lin F.
    Szepesvari, Csaba
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [3] Towards Tight Bounds on the Sample Complexity of Average-reward MDPs
    Jin, Yujia
    Sidford, Aaron
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs
    HasanzadeZonuzy, Aria
    Bura, Archana
    Kalathil, Dileep
    Shakkottai, Srinivas
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7667 - 7674
  • [5] Layered State Discovery for Incremental Autonomous Exploration
    Chen, Liyu
    Tirinzoni, Andrea
    Lazaric, Alessandro
    Pirotta, Matteo
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [6] Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
    Luo, Haipeng
    Wei, Chen-Yu
    Lee, Chung-Wei
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs
    Chen, Yuanzhou
    He, Jiafan
    Gu, Quanquan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [8] Autonomous Robotic Exploration by Incremental Road Map Construction
    Wang, Chaoqun
    Chi, Wenzheng
    Sun, Yuxiang
    Meng, Max Q. -H.
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2019, 16 (04) : 1720 - 1731
  • [9] Improved bounds on the sample complexity of learning
    Li, Y
    Long, PM
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2001, 62 (03) : 516 - 527
  • [10] Improved bounds on the sample complexity of learning
    Li, Y
    Long, PM
    Srinivasan, A
    [J]. PROCEEDINGS OF THE ELEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2000, : 309 - 318