Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states

被引：4

作者：

Aydin, Huseyin ^{[1
]}

Cilden, Erkin ^{[2
]}

Polat, Faruk ^{[1
]}

机构：

[1] Middle East Tech Univ, Dept Comp Engn, Ankara, Turkey

[2] STM Def Technol Engn & Trade Inc, Ankara, Turkey

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2022年 / 133卷

关键词：

Reinforcement learning; Task decomposition; Chains of bottleneck transitions;

D O I：

10.1016/j.future.2022.03.016

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Reinforcement learning is known to underperform in large and ambiguous problem domains under partial observability. In such cases, a proper decomposition of the task can improve and accelerate the learning process. Even ambiguous and complex problems that are not solvable by conventional methods turn out to be easier to handle by using a convenient problem decomposition, followed by the incorporation of machine learning methods for the sub-problems. Like in most real-life problems, the decomposition of a task usually stems from the sequence of sub-tasks that must be achieved in order to get the main task done. In this study, assuming that unambiguous states are provided in advance, a decomposition of the problem is constructed by the agent based on a set of chains of bottleneck transitions, which are sequences of unambiguous and critical transitions leading to the goal state. At the higher level, an agent trains its sub-agents to extract sub-policies corresponding to the sub-tasks, namely two successive transitions in any chain, and learns the value of each subpolicy at the abstract level. Experimental study demonstrates that an early decomposition based on useful bottleneck transitions eliminates the necessity for excessive memory and improves the learning performance of the agent. It is also shown that knowing the correct order of bottleneck transitions in the decomposition results in faster construction of the solution. (c) 2022 Elsevier B.V. All rights reserved.

引用

页码：153 / 168

页数：16

共 50 条

[1] Learning to Predict Phases of Manipulation Tasks as Hidden States
Kroemer, Oliver
van Hoof, Herke
Neumann, Gerhard
Peters, Jan
2014 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2014, : 4009 - 4014
[2] Reinforcement learning of a continuous motor sequence with hidden states
Arie, Hiroaki
Ogata, Tetsuya
Tani, Jun
Sugano, Shigeki
ADVANCED ROBOTICS, 2007, 21 (10) : 1215 - 1229
[3] Deep Reinforcement Learning with Hidden Layers on Future States
Kameko, Hirotaka
Suzuki, Jun
Mizukami, Naoki
Tsuruoka, Yoshimasa
COMPUTER GAMES (CGW 2017), 2018, 818 : 46 - 60
[4] Compact Frequency Memory for Reinforcement Learning with Hidden States
Aydin, Huseyin
Cilden, Erkin
Polat, Faruk
PRINCIPLES AND PRACTICE OF MULTI-AGENT SYSTEMS (PRIMA 2019), 2019, 11873 : 425 - 433
[5] Implementation of reinforcement learning strategies in the synthesis of neuromodels to solve medical diagnostics tasks
Leoshchenko, Serhii
Oliinyk, Andrii
Subbotin, Sergey
Lytvyn, Viktor
Korniienko, Oleksandr
IDDM 2021: INFORMATICS & DATA-DRIVEN MEDICINE: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATICS & DATA-DRIVEN MEDICINE (IDDM 2021), 2021, 3038 : 34 - 43
[6] On using reinforcement learning to solve sparse linear systems
Kuefler, Erik
Chen, Tzu-Yi
COMPUTATIONAL SCIENCE - ICCS 2008, PT 1, 2008, 5101 : 955 - 964
[7] A Biologically Plausible Architecture of the Striatum to Solve Context-Dependent Reinforcement Learning Tasks
Shivkumar, Sabyasachi
Muralidharan, Vignesh
Chakravarthy, V. Srinivasa
FRONTIERS IN NEURAL CIRCUITS, 2017, 11
[8] Landmark Based Reward Shaping in Reinforcement Learning with Hidden States
Demir, Alper
Cilden, Erkin
Polat, Faruk
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1922 - 1924
[9] Unconscious reinforcement learning of hidden brain states supported by confidence
Cortese, Aurelio
Lau, Hakwan
Kawato, Mitsuo
NATURE COMMUNICATIONS, 2020, 11 (01)
[10] Unconscious reinforcement learning of hidden brain states supported by confidence
Aurelio Cortese
Hakwan Lau
Mitsuo Kawato
Nature Communications, 11

← 1 2 3 4 5 →