Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states

被引:4
|
作者
Aydin, Huseyin [1 ]
Cilden, Erkin [2 ]
Polat, Faruk [1 ]
机构
[1] Middle East Tech Univ, Dept Comp Engn, Ankara, Turkey
[2] STM Def Technol Engn & Trade Inc, Ankara, Turkey
关键词
Reinforcement learning; Task decomposition; Chains of bottleneck transitions;
D O I
10.1016/j.future.2022.03.016
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Reinforcement learning is known to underperform in large and ambiguous problem domains under partial observability. In such cases, a proper decomposition of the task can improve and accelerate the learning process. Even ambiguous and complex problems that are not solvable by conventional methods turn out to be easier to handle by using a convenient problem decomposition, followed by the incorporation of machine learning methods for the sub-problems. Like in most real-life problems, the decomposition of a task usually stems from the sequence of sub-tasks that must be achieved in order to get the main task done. In this study, assuming that unambiguous states are provided in advance, a decomposition of the problem is constructed by the agent based on a set of chains of bottleneck transitions, which are sequences of unambiguous and critical transitions leading to the goal state. At the higher level, an agent trains its sub-agents to extract sub-policies corresponding to the sub-tasks, namely two successive transitions in any chain, and learns the value of each subpolicy at the abstract level. Experimental study demonstrates that an early decomposition based on useful bottleneck transitions eliminates the necessity for excessive memory and improves the learning performance of the agent. It is also shown that knowing the correct order of bottleneck transitions in the decomposition results in faster construction of the solution. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:153 / 168
页数:16
相关论文
共 50 条
  • [41] Model-free Reinforcement Learning for Spatiotemporal Tasks using Symbolic Automata
    Balakrishnan, Anand
    Jaksic, Stefan
    Aguilar, Edgar A.
    Nickovic, Dejan
    Deshmukh, Jyotirmoy, V
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 6834 - 6840
  • [42] Decomposing user-defined tasks in a reinforcement learning setup using TextWorld
    Petsanis, Thanos
    Keroglou, Christoforos
    Kapoutsis, Athanasios Ch.
    Kosmatopoulos, Elias B.
    Sirakoulis, Georgios Ch.
    FRONTIERS IN ROBOTICS AND AI, 2023, 10
  • [43] A novel design of hidden web crawler using reinforcement learning based agents
    Akilandeswari, J.
    Gopalan, N. P.
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2007, 4847 : 433 - +
  • [44] Hidden Link Prediction in Criminal Networks Using the Deep Reinforcement Learning Technique
    Lim, Marcus
    Abdullah, Azween
    Jhanjhi, N. Z.
    Supramaniam, Mahadevan
    COMPUTERS, 2019, 8 (01)
  • [45] Prioritizing software regression testing using reinforcement learning and hidden Markov model
    Rawat N.
    Somani V.
    Tripathi A.K.
    International Journal of Computers and Applications, 2023, 45 (12) : 748 - 754
  • [46] Defending DDoS attacks using Hidden Markov models and cooperative reinforcement learning
    Xu, Xin
    Sun, Yongqiang
    Huang, Zunguo
    INTELLIGENCE AND SECURITY INFORMATICS, 2007, 4430 : 196 - +
  • [47] A Strategy for Preparing Quantum Squeezed States Using Reinforcement Learning
    Zhao, Xiaolong
    Zhao, Yiming
    Li, Ming
    Li, Tingting
    Liu, Qian
    Guo, Shuai
    Yi, Xuexi
    ANNALEN DER PHYSIK, 2024, 536 (09)
  • [48] Path-finding Using Reinforcement Learning and Affective States
    Feldmaier, Johannes
    Diepold, Klaus
    2014 23RD IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (IEEE RO-MAN), 2014, : 543 - 548
  • [49] Efficient Provision of Service Function Chains in Overlay Networks Using Reinforcement Learning
    Li, Guanglei
    Zhou, Huachun
    Feng, Bohao
    Zhang, Yuming
    Yu, Shui
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (01) : 383 - 395
  • [50] Using Reinforcement Learning to Solve a Dynamic Orienteering Problem with Random Rewards Affected by the Battery Status
    Juan, Angel A.
    Marugan, Carolina A.
    Ahsini, Yusef
    Fornes, Rafael
    Panadero, Javier
    Martin, Xabier A.
    BATTERIES-BASEL, 2023, 9 (08):