Model-Based Reinforcement Learning Exploiting State-Action Equivalence

被引:0
|
作者
Asadi, Mahsa [1 ]
Talebi, Mohammad Sadegh [1 ]
Bourel, Hippolyte [1 ]
Maillard, Odalric-Ambrym [1 ]
机构
[1] Inria Lille Nord Europe, Villeneuve Dascq, France
关键词
Reinforcement Learning; Regret; Confidence Bound; Equivalence;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Leveraging an equivalence property in the state-space of a Markov Decision Process (MDP) has been investigated in several studies. This paper studies equivalence structure in the reinforcement learning (RL) setup, where transition distributions are no longer assumed to be known. We present a notion of similarity between transition probabilities of various state-action pairs of an MDP, which naturally defines an equivalence structure in the state-action space. We present equivalence-aware confidence sets for the case where the learner knows the underlying structure in advance. These sets are provably smaller than their corresponding equivalence-oblivious counterparts. In the more challenging case of an unknown equivalence structure, we present an algorithm called ApproxEquivalence that seeks to find an (approximate) equivalence structure, and define confidence sets using the approximate equivalence. To illustrate the efficacy of the presented confidence sets, we present C-UCRL, as a natural modification of UCRL2 for RL in undiscounted MDPs. In the case of a known equivalence structure, we show that C-UCRL improves over UCRL2 in terms of regret by a factor of root SA/C, in any communicating MDP with S states, A actions, and C classes, which corresponds to a massive improvement when C << SA. To the best of our knowledge, this is the first work providing regret bounds for RL when an equivalence structure in the MDP is efficiently exploited. In the case of an unknown equivalence structure, we show through numerical experiments that C-UCRL combined with ApproxEquivalence outperforms UCRL2 in ergodic MDPs.
引用
收藏
页码:204 / 219
页数:16
相关论文
共 50 条
  • [1] Scaling Up Q-Learning via Exploiting State-Action Equivalence
    Lyu, Yunlian
    Come, Aymeric
    Zhang, Yijie
    Talebi, Mohammad Sadegh
    [J]. ENTROPY, 2023, 25 (04)
  • [2] A REINFORCEMENT LEARNING MODEL USING DETERMINISTIC STATE-ACTION SEQUENCES
    Murata, Makoto
    Ozawa, Seiichi
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (02): : 577 - 590
  • [3] For SALE: State-Action Representation Learning for Deep Reinforcement Learning
    Fujimoto, Scott
    Chang, Wei-Di
    Smith, Edward J.
    Gu, Shixiang Shane
    Precup, Doina
    Meger, David
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Enhancing visual reinforcement learning with State-Action Representation
    Yan, Mengbei
    Lyu, Jiafei
    Li, Xiu
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [5] The Value Equivalence Principle for Model-Based Reinforcement Learning
    Grimm, Christopher
    Barreto, Andre
    Singh, Satinder
    Silver, David
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [6] PROJECTED STATE-ACTION BALANCING WEIGHTS FOR OFFLINE REINFORCEMENT LEARNING
    Wang, Jiayi
    Qi, Zhengling
    Wong, Raymond K. W.
    [J]. ANNALS OF STATISTICS, 2023, 51 (04): : 1639 - 1665
  • [7] Exploiting Generalization in the Subspaces for Faster Model-Based Reinforcement Learning
    Hashemzadeh, Maryam
    Hosseini, Reshad
    Ahmadabadi, Majid Nili
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (06) : 1635 - 1650
  • [8] Efficient Reinforcement Learning Using State-Action Uncertainty with Multiple Heads
    Aizu, Tomoharu
    Oba, Takeru
    Ukita, Norimichi
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VIII, 2023, 14261 : 184 - 196
  • [9] Speeding up Tabular Reinforcement Learning Using State-Action Similarities
    Rosenfeld, Ariel
    Taylor, Matthew E.
    Kraus, Sarit
    [J]. AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 1722 - 1724
  • [10] Swarm Reinforcement Learning Methods for Problems with Continuous State-Action Space
    Iima, Hitoshi
    Kuroe, Yasuaki
    Emoto, Kazuo
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 2173 - 2180