Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

被引：0

作者：

Tarbouriech, Jean ^{[1
,2
]}

Pirotta, Matteo ^{[1
]}

Valko, Michal ^{[3
]}

Lazaric, Alessandro ^{[1
]}

机构：

[1] Facebook AI Res Paris, Paris, France

[2] Inria Lille, Lille, France

[3] DeepMind Paris, Paris, France

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of epsilon-optimal goal-conditioned policies attaining all states that are incrementally reachable within L steps (in expectation) from a reference state s(0). In this paper, we introduce a novel model-based approach that interleaves discovering new states from s(0) and improving the accuracy of a model estimate that is used to compute goal-conditioned policies to reach newly discovered states. The resulting algorithm, DisCo, achieves a sample complexity scaling as (O) over tilde ((LSL+epsilon)-S-5 Gamma(L+epsilon) A epsilon(-2)), where A is the number of actions, SL+epsilon is the number of states that are incrementally reachable from s(0) in L + epsilon steps, and Gamma(L+epsilon) is the branching factor of the dynamics over such states. This improves over the algorithm proposed in [1] in both epsilon and L at the cost of an extra Gamma(L+epsilon) factor, which is small in most environments of interest. Furthermore, DisCo is the first algorithm that can return an epsilon/c(min)-optimal policy for any cost-sensitive shortest-path problem defined on the L-reachable states with minimum cost c(min). Finally, we report preliminary empirical results confirming our theoretical findings.

引用

页数：12

共 50 条

[1] Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms
Gajane, Pratik
Auer, Peter
Ortner, Ronald
[J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 3714 - 3722
[2] Near-Optimal Sample Complexity Bounds for Constrained MDPs
Vaswani, Sharan
Yang, Lin F.
Szepesvari, Csaba
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[3] Towards Tight Bounds on the Sample Complexity of Average-reward MDPs
Jin, Yujia
Sidford, Aaron
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[4] Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs
HasanzadeZonuzy, Aria
Bura, Archana
Kalathil, Dileep
Shakkottai, Srinivas
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7667 - 7674
[5] Layered State Discovery for Incremental Autonomous Exploration
Chen, Liyu
Tirinzoni, Andrea
Lazaric, Alessandro
Pirotta, Matteo
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[6] Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Luo, Haipeng
Wei, Chen-Yu
Lee, Chung-Wei
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[7] On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs
Chen, Yuanzhou
He, Jiafan
Gu, Quanquan
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[8] Autonomous Robotic Exploration by Incremental Road Map Construction
Wang, Chaoqun
Chi, Wenzheng
Sun, Yuxiang
Meng, Max Q. -H.
[J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2019, 16 (04) : 1720 - 1731
[9] Improved bounds on the sample complexity of learning
Li, Y
Long, PM
[J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2001, 62 (03) : 516 - 527
[10] Improved bounds on the sample complexity of learning
Li, Y
Long, PM
Srinivasan, A
[J]. PROCEEDINGS OF THE ELEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2000, : 309 - 318

← 1 2 3 4 5 →