Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives

被引:7
|
作者
Hahn, Ernst Moritz [1 ]
Perez, Mateo [2 ]
Schewe, Sven [3 ]
Somenzi, Fabio [2 ]
Trivedi, Ashutosh [2 ]
Wojtczak, Dominik [3 ]
机构
[1] Univ Twente, Enschede, Netherlands
[2] Univ Colorado, Boulder, CO 80309 USA
[3] Univ Liverpool, Liverpool, Merseyside, England
基金
美国国家科学基金会; 英国工程与自然科学研究理事会;
关键词
STOCHASTIC GAMES; COMPLEXITY; AUTOMATA;
D O I
10.1007/978-3-030-59152-6_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Omega-regular properties-specified using linear time temporal logic or various forms of omega-automata-find increasing use in specifying the objectives of reinforcement learning (RL). The key problem that arises is that of faithful and effective translation of the objective into a scalar reward for model-free RL. A recent approach exploits Buchi automata with restricted nondeterminism to reduce the search for an optimal policy for an omega-regular property to that for a simple reachability objective. A possible drawback of this translation is that reachability rewards are sparse, being reaped only at the end of each episode. Another approach reduces the search for an optimal policy to an optimization problem with two interdependent discount parameters. While this approach provides denser rewards than the reduction to reachability, it is not easily mapped to off-the-shelf RL algorithms. We propose a reward scheme that reduces the search for an optimal policy to an optimization problem with a single discount parameter that produces dense rewards and is compatible with off-the-shelf RL algorithms. Finally, we report an experimental comparison of these and other reward schemes for model-free RL with omega-regular objectives.
引用
收藏
页码:108 / 124
页数:17
相关论文
共 50 条
  • [1] Omega-Regular Objectives in Model-Free Reinforcement Learning
    Hahn, Ernst Moritz
    Perez, Mateo
    Schewe, Sven
    Somenzi, Fabio
    Trivedi, Ashutosh
    Wojtczak, Dominik
    TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, PT I, 2019, 11427 : 395 - 412
  • [2] Model-Free Reinforcement Learning for Lexicographic Omega-Regular Objectives
    Hahn, Ernst Moritz
    Perez, Mateo
    Schewe, Sven
    Somenzi, Fabio
    Trivedi, Ashutosh
    Wojtczak, Dominik
    FORMAL METHODS, FM 2021, 2021, 13047 : 142 - 159
  • [3] Limit Reachability for Model-Free Reinforcement Learning of ω-Regular Objectives
    Hahn, Ernst Moritz
    Perez, Mateo
    Schewe, Sven
    Somenzi, Fabio
    Trivedi, Ashutosh
    Wojtczak, Dominik
    PROCEEDINGS OF THE 5TH INTERNATIONAL WORKSHOP ON SYMBOLIC-NUMERIC METHODS FOR REASONING ABOUT CPS AND IOT (SNR 2019), 2019, : 16 - 18
  • [4] A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs
    Perez, Mateo
    Somenzi, Fabio
    Trivedi, Ashutosh
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21510 - 21517
  • [5] Mungojerrie: Linear-Time Objectives in Model-Free Reinforcement Learning
    Hahn, Ernst Moritz
    Perez, Mateo
    Schewe, Sven
    Somenzi, Fabio
    Trivedi, Ashutosh
    Wojtczak, Dominik
    TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, PT I, TACAS 2023, 2023, 13993 : 527 - 545
  • [6] Model-free average reward multi-step reinforcement learning
    Hu, Guanghua
    Wu, Cangpu
    Kongzhi Lilun Yu Yinyong/Control Theory and Applications, 2000, 17 (05): : 660 - 664
  • [7] Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems
    Cai, Tianchi
    Bao, Shenliao
    Jiang, Jiyan
    Zhou, Shiji
    Zhang, Wenpeng
    Gu, Lihong
    Gu, Jinjie
    Zhang, Guannan
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2179 - 2183
  • [8] Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives
    Bozkurt, Alper Kamil
    Wang, Yu
    Zavlanos, Michael M.
    Pajic, Miroslav
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 10649 - 10655
  • [9] Poster Abstract: Model-Free Reinforcement Learning for Symbolic Automata-encoded Objectives
    Balakrishnan, Anand
    Jaksic, Stefan
    Aguilar, Edgar A.
    Nickovic, Dejan
    Deshmukh, Jyotirmoy, V
    HSCC 2022: PROCEEDINGS OF THE 25TH ACM INTERNATIONAL CONFERENCE ON HYBRID SYSTEMS: COMPUTATION AND CONTROL (PART OF CPS-IOT WEEK 2022), 2022,
  • [10] Research on Improvement of Model-Free Average Reward Reinforcement Learning and Its Simulation Experiment
    Chen, Wei
    Zhai, Zhenkun
    Li, Xiong
    Guo, Jing
    Wang, Jie
    CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 4933 - 4936