Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states

被引：4

作者：

Aydin, Huseyin ^{[1
]}

Cilden, Erkin ^{[2
]}

Polat, Faruk ^{[1
]}

机构：

[1] Middle East Tech Univ, Dept Comp Engn, Ankara, Turkey

[2] STM Def Technol Engn & Trade Inc, Ankara, Turkey

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2022年 / 133卷

关键词：

Reinforcement learning; Task decomposition; Chains of bottleneck transitions;

D O I：

10.1016/j.future.2022.03.016

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Reinforcement learning is known to underperform in large and ambiguous problem domains under partial observability. In such cases, a proper decomposition of the task can improve and accelerate the learning process. Even ambiguous and complex problems that are not solvable by conventional methods turn out to be easier to handle by using a convenient problem decomposition, followed by the incorporation of machine learning methods for the sub-problems. Like in most real-life problems, the decomposition of a task usually stems from the sequence of sub-tasks that must be achieved in order to get the main task done. In this study, assuming that unambiguous states are provided in advance, a decomposition of the problem is constructed by the agent based on a set of chains of bottleneck transitions, which are sequences of unambiguous and critical transitions leading to the goal state. At the higher level, an agent trains its sub-agents to extract sub-policies corresponding to the sub-tasks, namely two successive transitions in any chain, and learns the value of each subpolicy at the abstract level. Experimental study demonstrates that an early decomposition based on useful bottleneck transitions eliminates the necessity for excessive memory and improves the learning performance of the agent. It is also shown that knowing the correct order of bottleneck transitions in the decomposition results in faster construction of the solution. (c) 2022 Elsevier B.V. All rights reserved.

引用

页码：153 / 168

页数：16

共 50 条

[41] Model-free Reinforcement Learning for Spatiotemporal Tasks using Symbolic Automata
Balakrishnan, Anand
Jaksic, Stefan
Aguilar, Edgar A.
Nickovic, Dejan
Deshmukh, Jyotirmoy, V
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 6834 - 6840
[42] Decomposing user-defined tasks in a reinforcement learning setup using TextWorld
Petsanis, Thanos
Keroglou, Christoforos
Kapoutsis, Athanasios Ch.
Kosmatopoulos, Elias B.
Sirakoulis, Georgios Ch.
FRONTIERS IN ROBOTICS AND AI, 2023, 10
[43] A novel design of hidden web crawler using reinforcement learning based agents
Akilandeswari, J.
Gopalan, N. P.
ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2007, 4847 : 433 - +
[44] Hidden Link Prediction in Criminal Networks Using the Deep Reinforcement Learning Technique
Lim, Marcus
Abdullah, Azween
Jhanjhi, N. Z.
Supramaniam, Mahadevan
COMPUTERS, 2019, 8 (01)
[45] Prioritizing software regression testing using reinforcement learning and hidden Markov model
Rawat N.
Somani V.
Tripathi A.K.
International Journal of Computers and Applications, 2023, 45 (12) : 748 - 754
[46] Defending DDoS attacks using Hidden Markov models and cooperative reinforcement learning
Xu, Xin
Sun, Yongqiang
Huang, Zunguo
INTELLIGENCE AND SECURITY INFORMATICS, 2007, 4430 : 196 - +
[47] A Strategy for Preparing Quantum Squeezed States Using Reinforcement Learning
Zhao, Xiaolong
Zhao, Yiming
Li, Ming
Li, Tingting
Liu, Qian
Guo, Shuai
Yi, Xuexi
ANNALEN DER PHYSIK, 2024, 536 (09)
[48] Path-finding Using Reinforcement Learning and Affective States
Feldmaier, Johannes
Diepold, Klaus
2014 23RD IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (IEEE RO-MAN), 2014, : 543 - 548
[49] Efficient Provision of Service Function Chains in Overlay Networks Using Reinforcement Learning
Li, Guanglei
Zhou, Huachun
Feng, Bohao
Zhang, Yuming
Yu, Shui
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (01) : 383 - 395
[50] Using Reinforcement Learning to Solve a Dynamic Orienteering Problem with Random Rewards Affected by the Battery Status
Juan, Angel A.
Marugan, Carolina A.
Ahsini, Yusef
Fornes, Rafael
Panadero, Javier
Martin, Xabier A.
BATTERIES-BASEL, 2023, 9 (08):

← 1 2 3 4 5 →