Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states

被引:4
|
作者
Aydin, Huseyin [1 ]
Cilden, Erkin [2 ]
Polat, Faruk [1 ]
机构
[1] Middle East Tech Univ, Dept Comp Engn, Ankara, Turkey
[2] STM Def Technol Engn & Trade Inc, Ankara, Turkey
关键词
Reinforcement learning; Task decomposition; Chains of bottleneck transitions;
D O I
10.1016/j.future.2022.03.016
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Reinforcement learning is known to underperform in large and ambiguous problem domains under partial observability. In such cases, a proper decomposition of the task can improve and accelerate the learning process. Even ambiguous and complex problems that are not solvable by conventional methods turn out to be easier to handle by using a convenient problem decomposition, followed by the incorporation of machine learning methods for the sub-problems. Like in most real-life problems, the decomposition of a task usually stems from the sequence of sub-tasks that must be achieved in order to get the main task done. In this study, assuming that unambiguous states are provided in advance, a decomposition of the problem is constructed by the agent based on a set of chains of bottleneck transitions, which are sequences of unambiguous and critical transitions leading to the goal state. At the higher level, an agent trains its sub-agents to extract sub-policies corresponding to the sub-tasks, namely two successive transitions in any chain, and learns the value of each subpolicy at the abstract level. Experimental study demonstrates that an early decomposition based on useful bottleneck transitions eliminates the necessity for excessive memory and improves the learning performance of the agent. It is also shown that knowing the correct order of bottleneck transitions in the decomposition results in faster construction of the solution. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:153 / 168
页数:16
相关论文
共 50 条
  • [1] Learning to Predict Phases of Manipulation Tasks as Hidden States
    Kroemer, Oliver
    van Hoof, Herke
    Neumann, Gerhard
    Peters, Jan
    2014 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2014, : 4009 - 4014
  • [2] Reinforcement learning of a continuous motor sequence with hidden states
    Arie, Hiroaki
    Ogata, Tetsuya
    Tani, Jun
    Sugano, Shigeki
    ADVANCED ROBOTICS, 2007, 21 (10) : 1215 - 1229
  • [3] Deep Reinforcement Learning with Hidden Layers on Future States
    Kameko, Hirotaka
    Suzuki, Jun
    Mizukami, Naoki
    Tsuruoka, Yoshimasa
    COMPUTER GAMES (CGW 2017), 2018, 818 : 46 - 60
  • [4] Compact Frequency Memory for Reinforcement Learning with Hidden States
    Aydin, Huseyin
    Cilden, Erkin
    Polat, Faruk
    PRINCIPLES AND PRACTICE OF MULTI-AGENT SYSTEMS (PRIMA 2019), 2019, 11873 : 425 - 433
  • [5] Implementation of reinforcement learning strategies in the synthesis of neuromodels to solve medical diagnostics tasks
    Leoshchenko, Serhii
    Oliinyk, Andrii
    Subbotin, Sergey
    Lytvyn, Viktor
    Korniienko, Oleksandr
    IDDM 2021: INFORMATICS & DATA-DRIVEN MEDICINE: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATICS & DATA-DRIVEN MEDICINE (IDDM 2021), 2021, 3038 : 34 - 43
  • [6] On using reinforcement learning to solve sparse linear systems
    Kuefler, Erik
    Chen, Tzu-Yi
    COMPUTATIONAL SCIENCE - ICCS 2008, PT 1, 2008, 5101 : 955 - 964
  • [7] A Biologically Plausible Architecture of the Striatum to Solve Context-Dependent Reinforcement Learning Tasks
    Shivkumar, Sabyasachi
    Muralidharan, Vignesh
    Chakravarthy, V. Srinivasa
    FRONTIERS IN NEURAL CIRCUITS, 2017, 11
  • [8] Landmark Based Reward Shaping in Reinforcement Learning with Hidden States
    Demir, Alper
    Cilden, Erkin
    Polat, Faruk
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1922 - 1924
  • [9] Unconscious reinforcement learning of hidden brain states supported by confidence
    Cortese, Aurelio
    Lau, Hakwan
    Kawato, Mitsuo
    NATURE COMMUNICATIONS, 2020, 11 (01)
  • [10] Unconscious reinforcement learning of hidden brain states supported by confidence
    Aurelio Cortese
    Hakwan Lau
    Mitsuo Kawato
    Nature Communications, 11