Online Learning of non-Markovian Reward Models

被引:1
|
作者
Rens, Gavin [1 ]
Raskin, Jean-Francois [2 ]
Reynouard, Raphael [2 ]
Marra, Giuseppe [1 ]
机构
[1] Katholieke Univ Leuven, DTAI Grp, Leuven, Belgium
[2] Univ Libre Bruxelles, Brussels, Belgium
来源
ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2 | 2021年
关键词
non-Markovian Rewards; Learning Mealy Machines; Angluin's Algorithm; MACHINES;
D O I
10.5220/0010212000740086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are situations in which an agent should receive rewards only after having accomplished a series of previous tasks, that is, rewards are non-Markovian. One natural and quite general way to represent history-dependent rewards is via a Mealy machine. In our formal setting, we consider a Markov decision process (MDP) that models the dynamics of the environment in which the agent evolves and a Mealy machine synchronized with this MDP to formalize the non-Markovian reward function. While the MDP is known by the agent, the reward function is unknown to the agent and must be learned. Our approach to overcome this challenge is to use Angluin's L* active learning algorithm to learn a Mealy machine representing the underlying non-Markovian reward machine (MRM). Formal methods are used to determine the optimal strategy for answering so-called membership queries posed by L*. Moreover, we prove that the expected reward achieved will eventually be at least as much as a given, reasonable value provided by a domain expert. We evaluate our framework on two problems. The results show that using L* to learn an MRM in a non-Markovian reward decision process is effective.
引用
收藏
页码:74 / 86
页数:13
相关论文
共 50 条
  • [31] Evolutionary dynamics in non-Markovian models of microbial populations
    Jafarpour, Farshid
    Levien, Ethan
    Amir, Ariel
    PHYSICAL REVIEW E, 2023, 108 (03)
  • [32] Non-Markovian Open Dynamics from Collision Models
    Pathak, Vijay
    Shaji, Anil
    OPEN SYSTEMS & INFORMATION DYNAMICS, 2019, 26 (04):
  • [33] Non-Markovian Models of Blocking in Concurrent and Countercurrent Flows
    Gabrielli, A.
    Talbot, J.
    Viot, P.
    PHYSICAL REVIEW LETTERS, 2013, 110 (17)
  • [34] Data-driven non-Markovian closure models
    Kondrashov, Dmitri
    Chekroun, Mickael D.
    Ghil, Michael
    PHYSICA D-NONLINEAR PHENOMENA, 2015, 297 : 33 - 55
  • [35] Dynamic survival analysis for non-Markovian epidemic models
    Di Lauro, Francesco
    KhudaBukhsh, Wasiur R.
    Kiss, Istvan Z.
    Kenah, Eben
    Jensen, Max
    Rempala, Grzegorz A.
    JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2022, 19 (191)
  • [36] ON THE HIV INCUBATION DISTRIBUTION UNDER NON-MARKOVIAN MODELS
    TAN, WY
    STATISTICS & PROBABILITY LETTERS, 1994, 21 (01) : 49 - 57
  • [37] Clustering indices and decay of correlations in non-Markovian models
    Abadi, Miguel
    Moreira Freitas, Ana Cristina
    Freitas, Jorge Milhazes
    NONLINEARITY, 2019, 32 (12) : 4853 - 4870
  • [38] Generalization of Pairwise Models to non-Markovian Epidemics on Networks
    Kiss, Istvan Z.
    Roest, Gergely
    Vizi, Zsolt
    PHYSICAL REVIEW LETTERS, 2015, 115 (07)
  • [39] Markovian embedding of non-Markovian superdiffusion
    Siegle, Peter
    Goychuk, Igor
    Talkner, Peter
    Haenggi, Peter
    PHYSICAL REVIEW E, 2010, 81 (01)
  • [40] Markovian and Non-Markovian Protein Sequence Evolution: Aggregated Markov Process Models
    Kosiol, Carolin
    Goldman, Nick
    JOURNAL OF MOLECULAR BIOLOGY, 2011, 411 (04) : 910 - 923