Pure Exploration in Episodic Fixed-Horizon Markov Decision Processes

被引:0
|
作者
Putta, Sudeep Raja [1 ]
Tulabandhula, Theja [2 ]
机构
[1] Conduent Labs India, Bangalore, Karnataka, India
[2] Univ Illinois, Chicago, IL USA
关键词
Reinforcement Learning; Pure Exploration; Multi-Armed Bandit; Markov Decision Process;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-Armed Bandit (MAB) problems can be naturally extended to Markov Decision Processes (MDP). We extend the Best Arm Identification problem to episodic fixed-horizon MDPs. Here, the goal of an agent interacting with the MDP is to reach a high confidence on the optimal policy in as few episodes as possible. We propose Posterior Sampling for Pure Exploration (PSPE), a Bayesian algorithm for pure exploration in MDPs. We empirically show that PSPE achieves deep exploration and the number of episodes required by PSPE for reaching a fixed confidence value is exponentially lower than random exploration and lower than reward maximizing algorithms such as Posterior Sampling for Reinforcement Learning (PSRL).
引用
收藏
页码:1703 / 1704
页数:2
相关论文
共 50 条
  • [1] Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
    Dann, Christoph
    Brunskill, Emma
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [2] Online Learning with Implicit Exploration in Episodic Markov Decision Processes
    Ghasemi, Mahsa
    Hashemi, Abolfazl
    Vikalo, Haris
    Topcu, Ufuk
    [J]. 2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1953 - 1958
  • [3] Model-free Adaptive Optimal Control of Episodic Fixed-horizon Manufacturing Processes Using Reinforcement Learning
    Dornheim, Johannes
    Link, Norbert
    Gumbsch, Peter
    [J]. INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2020, 18 (06) : 1593 - 1604
  • [4] Model-free Adaptive Optimal Control of Episodic Fixed-horizon Manufacturing Processes Using Reinforcement Learning
    Johannes Dornheim
    Norbert Link
    Peter Gumbsch
    [J]. International Journal of Control, Automation and Systems, 2020, 18 : 1593 - 1604
  • [5] Fixed-Horizon Active Hypothesis Testing
    Kartik, Dhruva
    Nayyar, Ashutosh
    Mitra, Urbashi
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (04) : 1882 - 1897
  • [6] APPROXIMATE FIXED POINT ITERATION WITH AN APPLICATION TO INFINITE HORIZON MARKOV DECISION PROCESSES
    Almudevar, Anthony
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2008, 47 (05) : 2303 - 2347
  • [7] Markov decision processes with random horizon
    Iida, T
    Mori, M
    [J]. JOURNAL OF THE OPERATIONS RESEARCH SOCIETY OF JAPAN, 1996, 39 (04) : 592 - 603
  • [8] Decision roll and horizon roll processes in infinite horizon discounted Markov decision processes
    White, DJ
    [J]. MANAGEMENT SCIENCE, 1996, 42 (01) : 37 - 50
  • [9] Episodic task learning in Markov decision processes
    Yong Lin
    Fillia Makedon
    Yurong Xu
    [J]. Artificial Intelligence Review, 2011, 36 : 87 - 98
  • [10] Episodic task learning in Markov decision processes
    Lin, Yong
    Makedon, Fillia
    Xu, Yurong
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2011, 36 (02) : 87 - 98