Online Learning of non-Markovian Reward Models

被引:1
|
作者
Rens, Gavin [1 ]
Raskin, Jean-Francois [2 ]
Reynouard, Raphael [2 ]
Marra, Giuseppe [1 ]
机构
[1] Katholieke Univ Leuven, DTAI Grp, Leuven, Belgium
[2] Univ Libre Bruxelles, Brussels, Belgium
来源
ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2 | 2021年
关键词
non-Markovian Rewards; Learning Mealy Machines; Angluin's Algorithm; MACHINES;
D O I
10.5220/0010212000740086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are situations in which an agent should receive rewards only after having accomplished a series of previous tasks, that is, rewards are non-Markovian. One natural and quite general way to represent history-dependent rewards is via a Mealy machine. In our formal setting, we consider a Markov decision process (MDP) that models the dynamics of the environment in which the agent evolves and a Mealy machine synchronized with this MDP to formalize the non-Markovian reward function. While the MDP is known by the agent, the reward function is unknown to the agent and must be learned. Our approach to overcome this challenge is to use Angluin's L* active learning algorithm to learn a Mealy machine representing the underlying non-Markovian reward machine (MRM). Formal methods are used to determine the optimal strategy for answering so-called membership queries posed by L*. Moreover, we prove that the expected reward achieved will eventually be at least as much as a given, reasonable value provided by a domain expert. We evaluate our framework on two problems. The results show that using L* to learn an MRM in a non-Markovian reward decision process is effective.
引用
收藏
页码:74 / 86
页数:13
相关论文
共 50 条
  • [1] An Approach of Transforming Non-Markovian Reward to Markovian Reward
    Miao, Ruixuan
    Lu, Xu
    Cui, Jin
    STRUCTURED OBJECT-ORIENTED FORMAL LANGUAGE AND METHOD, SOFL+MSVL 2022, 2023, 13854 : 12 - 29
  • [2] MEMORY FOR REWARD IN PROBABILISTIC CHOICE - MARKOVIAN AND NON-MARKOVIAN PROPERTIES
    DAVIS, DGS
    STADDON, JER
    BEHAVIOUR, 1990, 114 : 37 - 64
  • [3] HIDDEN NON-MARKOVIAN REWARD MODELS : VIRTUAL STOCHASTIC SENSORS FOR HYBRID SYSTEMS
    Krull, Claudia
    Horton, Graham
    2012 WINTER SIMULATION CONFERENCE (WSC), 2012,
  • [4] Evaluation of Safe Reinforcement Learning with CoMirror Algorithm in a Non-Markovian Reward Problem
    Miyashita, Megumi
    Yano, Shiro
    Kondo, Toshiyuki
    INTELLIGENT AUTONOMOUS SYSTEMS 17, IAS-17, 2023, 577 : 62 - 72
  • [5] Non-Markovian fluctuations in Markovian models of protein dynamics
    Dua, Arti
    Adhikari, R.
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2011,
  • [6] Bayesian reinforcement learning in Markovian and non-Markovian tasks
    Ez-zizi, Adnane
    Farrell, Simon
    Leslie, David
    2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 579 - 586
  • [7] Reinforcement learning in non-Markovian environments
    Chandak, Siddharth
    Shah, Pratik
    Borkar, Vivek S.
    Dodhia, Parth
    SYSTEMS & CONTROL LETTERS, 2024, 185
  • [8] Non-Markovian Speedup Dynamics in Markovian and Non-Markovian Channels
    Jing Nie
    Yingshuang Liang
    Biao Wang
    Xiuyi Yang
    International Journal of Theoretical Physics, 2021, 60 : 2889 - 2900
  • [9] Non-Markovian Speedup Dynamics in Markovian and Non-Markovian Channels
    Nie, Jing
    Liang, Yingshuang
    Wang, Biao
    Yang, Xiuyi
    INTERNATIONAL JOURNAL OF THEORETICAL PHYSICS, 2021, 60 (08) : 2889 - 2900
  • [10] Reinforcement Learning with Non-Markovian Rewards
    Gaon, Maor
    Brafman, Ronen, I
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3980 - 3987