Online Learning of non-Markovian Reward Models

被引:1
|
作者
Rens, Gavin [1 ]
Raskin, Jean-Francois [2 ]
Reynouard, Raphael [2 ]
Marra, Giuseppe [1 ]
机构
[1] Katholieke Univ Leuven, DTAI Grp, Leuven, Belgium
[2] Univ Libre Bruxelles, Brussels, Belgium
来源
ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2 | 2021年
关键词
non-Markovian Rewards; Learning Mealy Machines; Angluin's Algorithm; MACHINES;
D O I
10.5220/0010212000740086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are situations in which an agent should receive rewards only after having accomplished a series of previous tasks, that is, rewards are non-Markovian. One natural and quite general way to represent history-dependent rewards is via a Mealy machine. In our formal setting, we consider a Markov decision process (MDP) that models the dynamics of the environment in which the agent evolves and a Mealy machine synchronized with this MDP to formalize the non-Markovian reward function. While the MDP is known by the agent, the reward function is unknown to the agent and must be learned. Our approach to overcome this challenge is to use Angluin's L* active learning algorithm to learn a Mealy machine representing the underlying non-Markovian reward machine (MRM). Formal methods are used to determine the optimal strategy for answering so-called membership queries posed by L*. Moreover, we prove that the expected reward achieved will eventually be at least as much as a given, reasonable value provided by a domain expert. We evaluate our framework on two problems. The results show that using L* to learn an MRM in a non-Markovian reward decision process is effective.
引用
收藏
页码:74 / 86
页数:13
相关论文
共 50 条
  • [41] Human and Machine Learning in Non-Markovian Decision Making
    Clarke, Aaron Michael
    Friedrich, Johannes
    Tartaglia, Elisa M.
    Marchesotti, Silvia
    Senn, Walter
    Herzog, Michael H.
    PLOS ONE, 2015, 10 (04):
  • [42] Markovian and Non-Markovian Quantum Measurements
    Jennifer R. Glick
    Christoph Adami
    Foundations of Physics, 2020, 50 : 1008 - 1055
  • [43] Markovian and Non-Markovian Quantum Measurements
    Glick, Jennifer R.
    Adami, Christoph
    FOUNDATIONS OF PHYSICS, 2020, 50 (09) : 1008 - 1055
  • [44] A MODIFICATION OF NON-MARKOVIAN ENCOUNTER THEORY .1. MARKOVIAN DESCRIPTION IN NON-MARKOVIAN THEORIES
    KIPRIYANOV, AA
    GOPICH, IV
    DOKTOROV, AB
    CHEMICAL PHYSICS, 1994, 187 (03) : 241 - 249
  • [45] Non-Markovian Reinforcement Learning using Fractional Dynamics
    Gupta, Gaurav
    Yin, Chenzhong
    Deshmukh, Jyotirmoy, V
    Bogdan, Paul
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 1542 - 1547
  • [46] NON-MARKOVIAN NOISE
    FULINSKI, A
    PHYSICAL REVIEW E, 1994, 50 (04) : 2668 - 2681
  • [47] APPLYING Q-LEARNING TO NON-MARKOVIAN ENVIRONMENTS
    Chizhov, Jurij
    Borisov, Arkady
    ICAART 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, 2009, : 306 - +
  • [48] Relaxation in non-Markovian models: From static to dynamic heterogeneity
    Torregrosa Cabanilles, C.
    Molina-Mateo, J.
    Sabater i Serra, R.
    Meseguer-Duenas, J. M.
    Gomez Ribelles, J. L.
    JOURNAL OF NON-CRYSTALLINE SOLIDS, 2022, 576
  • [49] Assessing dengue risk globally using non-Markovian models
    Vajdi, Aram
    Cohnstaedt, Lee W.
    Scoglio, Caterina M.
    JOURNAL OF THEORETICAL BIOLOGY, 2024, 591
  • [50] Quantum Non-Markovian Piecewise Dynamics from Collision Models
    Lorenzo, Salvatore
    Ciccarello, Francesco
    Palma, G. Massimo
    Vacchini, Bassano
    OPEN SYSTEMS & INFORMATION DYNAMICS, 2017, 24 (04):