Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states

被引:4
|
作者
Aydin, Huseyin [1 ]
Cilden, Erkin [2 ]
Polat, Faruk [1 ]
机构
[1] Middle East Tech Univ, Dept Comp Engn, Ankara, Turkey
[2] STM Def Technol Engn & Trade Inc, Ankara, Turkey
关键词
Reinforcement learning; Task decomposition; Chains of bottleneck transitions;
D O I
10.1016/j.future.2022.03.016
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Reinforcement learning is known to underperform in large and ambiguous problem domains under partial observability. In such cases, a proper decomposition of the task can improve and accelerate the learning process. Even ambiguous and complex problems that are not solvable by conventional methods turn out to be easier to handle by using a convenient problem decomposition, followed by the incorporation of machine learning methods for the sub-problems. Like in most real-life problems, the decomposition of a task usually stems from the sequence of sub-tasks that must be achieved in order to get the main task done. In this study, assuming that unambiguous states are provided in advance, a decomposition of the problem is constructed by the agent based on a set of chains of bottleneck transitions, which are sequences of unambiguous and critical transitions leading to the goal state. At the higher level, an agent trains its sub-agents to extract sub-policies corresponding to the sub-tasks, namely two successive transitions in any chain, and learns the value of each subpolicy at the abstract level. Experimental study demonstrates that an early decomposition based on useful bottleneck transitions eliminates the necessity for excessive memory and improves the learning performance of the agent. It is also shown that knowing the correct order of bottleneck transitions in the decomposition results in faster construction of the solution. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:153 / 168
页数:16
相关论文
共 50 条
  • [21] Leveraging vehicle connectivity and autonomy for highway bottleneck congestion mitigation using reinforcement learning
    Ha, Paul
    Chen, Sikai
    Dong, Jiqian
    Labi, Samuel
    TRANSPORTMETRICA A-TRANSPORT SCIENCE, 2025, 21 (01) : 1 - 26
  • [22] Ramp Metering for a Distant Downstream Bottleneck Using Reinforcement Learning with Value Function Approximation
    Zhou, Yue
    Ozbay, Kaan
    Kachroo, Pushkin
    Zuo, Fan
    JOURNAL OF ADVANCED TRANSPORTATION, 2020, 2020 (2020)
  • [23] Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks
    Sachweh, Timon
    Haritz, Pierre
    Liebig, Thomas
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1686 - 1692
  • [24] Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning
    Pankayaraj Pathmanathan
    Natalia Díaz-Rodríguez
    Javier Del Ser
    Cognitive Computation, 2024, 16 : 425 - 453
  • [25] Using Ensemble Techniques and Multi-Objectivization to Solve Reinforcement Learning Problems
    Brys, Tim
    Taylor, Matthew E.
    Nowe, Ann
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 981 - +
  • [26] Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning
    Pathmanathan, Pankayaraj
    Diaz-Rodriguez, Natalia
    Del Ser, Javier
    COGNITIVE COMPUTATION, 2024, 16 (01) : 425 - 453
  • [27] Solving batch process scheduling/planning tasks using reinforcement learning
    Martínez, EC
    COMPUTERS & CHEMICAL ENGINEERING, 1999, 23 : S527 - S530
  • [28] Edge Generation Scheduling for DAG Tasks Using Deep Reinforcement Learning
    Sun, Binqi
    Theile, Mirco
    Qin, Ziyuan
    Bernardini, Daniele
    Roy, Debayan
    Bastoni, Andrea
    Caccamo, Marco
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (04) : 1034 - 1047
  • [29] BAM Learning of Nonlinearly Separable Tasks by Using an Asymmetrical Output Function and Reinforcement Learning
    Chartier, Sylvain
    Boukadoum, Mounir
    Amiri, Mahmood
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (08): : 1281 - 1292
  • [30] Using reinforcement learning to solve for optimal electricity generation investment under uncertainty
    Grobman, JH
    Carey, JM
    STRUCTURE OF THE ENERGY INDUSTRIES: THE ONLY CONSTANT IS CHANGE, CONFERENCE PROCEEDINGS, 1999, : 127 - 136