Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

被引:0
|
作者
Feng, Fei [1 ]
Wang, Ruosong [2 ]
Yin, Wotao [1 ]
Du, Simon S. [3 ]
Yang, Lin F. [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA USA
[3] Univ Washington, Seattle, WA 98195 USA
关键词
BERNOULLI MIXTURE-MODELS; SUBSPACE; ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems (Tang et al., 2017; Bellemare et al., 2016), we investigate when this paradigm is provably efficient. We study episodic Markov decision processes with rich observations generated from a small number of latent states. We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a no-regret tabular RL algorithm. Theoretically, we prove that as long as the unsupervised learning algorithm enjoys a polynomial sample complexity guarantee, we can find a near-optimal policy with sample complexity polynomial in the number of latent states, which is significantly smaller than the number of observations. Empirically, we instantiate our framework on a class of hard exploration problems to demonstrate the practicality of our theory.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] A Provably Efficient Sample Collection Strategy for Reinforcement Learning
    Tarbouriech, Jean
    Pirotta, Matteo
    Valko, Michal
    Lazaric, Alessandro
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Provably Efficient Reinforcement Learning with Linear Function Approximation
    Jin, Chi
    Yang, Zhuoran
    Wang, Zhaoran
    Jordan, Michael, I
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (03) : 1496 - 1521
  • [3] Provably Good Batch Reinforcement Learning Without Great Exploration
    Liu, Yao
    Swaminathan, Adith
    Agarwal, Alekh
    Brunskill, Emma
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning
    Kong, Dingwen
    Yang, Lin F.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [5] Distributional Reinforcement Learning for Efficient Exploration
    Mavrin, Borislav
    Yao, Hengshuai
    Kong, Linglong
    Wu, Kaiwen
    Yu, Yaoliang
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [6] Gap-Dependent Unsupervised Exploration for Reinforcement Learning
    Wu, Jingfeng
    Braverman, Vladimir
    Yang, Lin F.
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [7] Provably Efficient Causal Reinforcement Learning with Confounded Observational Data
    Wang, Lingxiao
    Yang, Zhuoran
    Wang, Zhaoran
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [8] Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping
    Zhou, Dongruo
    He, Jiafan
    Gu, Quanquan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [9] Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems
    Uehara, Masatoshi
    Sekhari, Ayush
    Kallus, Nathan
    Lee, Jason D.
    Sun, Wen
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [10] Provably Efficient Offline Reinforcement Learning in Regular Decision Processes
    Cipollone, Roberto
    Jonsson, Anders
    Ronca, Alessandro
    Talebi, Mohammad Sadegh
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,