Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

被引：0

作者：

Feng, Fei ^{[1
]}

Wang, Ruosong ^{[2
]}

Yin, Wotao ^{[1
]}

Du, Simon S. ^{[3
]}

Yang, Lin F. ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA

[2] Carnegie Mellon Univ, Pittsburgh, PA USA

[3] Univ Washington, Seattle, WA 98195 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷

关键词：

BERNOULLI MIXTURE-MODELS; SUBSPACE; ALGORITHM;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems (Tang et al., 2017; Bellemare et al., 2016), we investigate when this paradigm is provably efficient. We study episodic Markov decision processes with rich observations generated from a small number of latent states. We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a no-regret tabular RL algorithm. Theoretically, we prove that as long as the unsupervised learning algorithm enjoys a polynomial sample complexity guarantee, we can find a near-optimal policy with sample complexity polynomial in the number of latent states, which is significantly smaller than the number of observations. Empirically, we instantiate our framework on a class of hard exploration problems to demonstrate the practicality of our theory.

引用

页数：13

共 50 条

[1] A Provably Efficient Sample Collection Strategy for Reinforcement Learning
Tarbouriech, Jean
Pirotta, Matteo
Valko, Michal
Lazaric, Alessandro
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[2] Provably Efficient Reinforcement Learning with Linear Function Approximation
Jin, Chi
Yang, Zhuoran
Wang, Zhaoran
Jordan, Michael, I
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (03) : 1496 - 1521
[3] Provably Good Batch Reinforcement Learning Without Great Exploration
Liu, Yao
Swaminathan, Adith
Agarwal, Alekh
Brunskill, Emma
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[4] Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning
Kong, Dingwen
Yang, Lin F.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[5] Distributional Reinforcement Learning for Efficient Exploration
Mavrin, Borislav
Yao, Hengshuai
Kong, Linglong
Wu, Kaiwen
Yu, Yaoliang
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[6] Gap-Dependent Unsupervised Exploration for Reinforcement Learning
Wu, Jingfeng
Braverman, Vladimir
Yang, Lin F.
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[7] Provably Efficient Causal Reinforcement Learning with Confounded Observational Data
Wang, Lingxiao
Yang, Zhuoran
Wang, Zhaoran
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[8] Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping
Zhou, Dongruo
He, Jiafan
Gu, Quanquan
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[9] Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems
Uehara, Masatoshi
Sekhari, Ayush
Kallus, Nathan
Lee, Jason D.
Sun, Wen
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[10] Provably Efficient Offline Reinforcement Learning in Regular Decision Processes
Cipollone, Roberto
Jonsson, Anders
Ronca, Alessandro
Talebi, Mohammad Sadegh
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →