Learning reward machines: A study in partially observable reinforcement learning 

被引:2
|
作者
Icarte, Rodrigo Toro [1 ,6 ]
Klassen, Toryn Q. [3 ,4 ]
Valenzano, Richard [5 ]
Castro, Margarita P. [2 ,6 ]
Waldie, Ethan [3 ]
Mcilraith, Sheila A. [3 ,4 ]
机构
[1] Pontificia Univ Catolica Chile, Dept Comp Sci, Santiago, RM, Chile
[2] Pontificia Univ Catolica Chile PUC, Dept Ind & Syst Engn, Santiago, RM, Chile
[3] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[4] Vector Inst Artificial Intelligence, Toronto, ON, Canada
[5] Toronto Metropolitan Univ, Toronto, ON, Canada
[6] Ctr Nacl Inteligencia Artificial CENIA, Santiago, RM, Chile
关键词
Reinforcement learning; Reward machines; Partial observability; Automata learning; Abstractions; Non-Markovian environments;
D O I
10.1016/j.artint.2023.103989
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement Learning (RL) is a machine learning paradigm wherein an artificial agent interacts with an environment with the purpose of learning behaviour that maximizes the expected cumulative reward it receives from the environment. Reward machines (RMs) provide a structured, automata-based representation of a reward function that enables an RL agent to decompose an RL problem into structured subproblems that can be efficiently learned via off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems. We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem. We show the effectiveness of this approach on three partially observable domains, where it significantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations, and broader potential.1 & COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Learning Reward Machines for Partially Observable Reinforcement Learning
    Icarte, Rodrigo Toro
    Waldie, Ethan
    Klassen, Toryn Q.
    Valenzano, Richard
    Castro, Margarita P.
    McIlraith, Sheila A.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [2] Inverse reinforcement learning in partially observable environments
    Choi, Jaedeug
    Kim, Kee-Eung
    [J]. Journal of Machine Learning Research, 2011, 12 : 691 - 730
  • [3] Inverse Reinforcement Learning in Partially Observable Environments
    Choi, Jaedeug
    Kim, Kee-Eung
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 691 - 730
  • [4] Reinforcement Learning with Stochastic Reward Machines
    Corazza, Jan
    Gavran, Ivan
    Neider, Daniel
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6429 - 6436
  • [5] Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning
    Park, Giseung
    Choi, Sungho
    Sung, Youngchul
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7941 - 7948
  • [6] Inverse Reinforcement Learning in Partially Observable Environments
    Choi, Jaedeug
    Kim, Kee-Eung
    [J]. 21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1028 - 1033
  • [7] Partially Observable Reinforcement Learning for Sustainable Active Surveillance
    Chen, Hechang
    Yang, Bo
    Liu, Jiming
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2018, PT II, 2018, 11062 : 425 - 437
  • [8] Regret Minimization for Partially Observable Deep Reinforcement Learning
    Jin, Peter
    Keutzer, Kurt
    Levine, Sergey
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [9] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
    Icarte, Rodrigo Toro
    Klassen, Toryn Q.
    Valenzano, Richard
    McIlraith, Sheila A.
    [J]. Journal of Artificial Intelligence Research, 2022, 73 : 173 - 208
  • [10] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
    Icarte, Rodrigo Toro
    Klassen, Toryn Q.
    Valenzano, Richard
    Mcllraith, Sheila A.
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 73 : 173 - 208