Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states

被引：4

作者：

Aydin, Huseyin ^{[1
]}

Cilden, Erkin ^{[2
]}

Polat, Faruk ^{[1
]}

机构：

[1] Middle East Tech Univ, Dept Comp Engn, Ankara, Turkey

[2] STM Def Technol Engn & Trade Inc, Ankara, Turkey

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2022年 / 133卷

关键词：

Reinforcement learning; Task decomposition; Chains of bottleneck transitions;

D O I：

10.1016/j.future.2022.03.016

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Reinforcement learning is known to underperform in large and ambiguous problem domains under partial observability. In such cases, a proper decomposition of the task can improve and accelerate the learning process. Even ambiguous and complex problems that are not solvable by conventional methods turn out to be easier to handle by using a convenient problem decomposition, followed by the incorporation of machine learning methods for the sub-problems. Like in most real-life problems, the decomposition of a task usually stems from the sequence of sub-tasks that must be achieved in order to get the main task done. In this study, assuming that unambiguous states are provided in advance, a decomposition of the problem is constructed by the agent based on a set of chains of bottleneck transitions, which are sequences of unambiguous and critical transitions leading to the goal state. At the higher level, an agent trains its sub-agents to extract sub-policies corresponding to the sub-tasks, namely two successive transitions in any chain, and learns the value of each subpolicy at the abstract level. Experimental study demonstrates that an early decomposition based on useful bottleneck transitions eliminates the necessity for excessive memory and improves the learning performance of the agent. It is also shown that knowing the correct order of bottleneck transitions in the decomposition results in faster construction of the solution. (c) 2022 Elsevier B.V. All rights reserved.

引用

页码：153 / 168

页数：16

共 50 条

[21] Leveraging vehicle connectivity and autonomy for highway bottleneck congestion mitigation using reinforcement learning
Ha, Paul
Chen, Sikai
Dong, Jiqian
Labi, Samuel
TRANSPORTMETRICA A-TRANSPORT SCIENCE, 2025, 21 (01) : 1 - 26
[22] Ramp Metering for a Distant Downstream Bottleneck Using Reinforcement Learning with Value Function Approximation
Zhou, Yue
Ozbay, Kaan
Kachroo, Pushkin
Zuo, Fan
JOURNAL OF ADVANCED TRANSPORTATION, 2020, 2020 (2020)
[23] Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks
Sachweh, Timon
Haritz, Pierre
Liebig, Thomas
2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1686 - 1692
[24] Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning
Pankayaraj Pathmanathan
Natalia Díaz-Rodríguez
Javier Del Ser
Cognitive Computation, 2024, 16 : 425 - 453
[25] Using Ensemble Techniques and Multi-Objectivization to Solve Reinforcement Learning Problems
Brys, Tim
Taylor, Matthew E.
Nowe, Ann
21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 981 - +
[26] Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning
Pathmanathan, Pankayaraj
Diaz-Rodriguez, Natalia
Del Ser, Javier
COGNITIVE COMPUTATION, 2024, 16 (01) : 425 - 453
[27] Solving batch process scheduling/planning tasks using reinforcement learning
Martínez, EC
COMPUTERS & CHEMICAL ENGINEERING, 1999, 23 : S527 - S530
[28] Edge Generation Scheduling for DAG Tasks Using Deep Reinforcement Learning
Sun, Binqi
Theile, Mirco
Qin, Ziyuan
Bernardini, Daniele
Roy, Debayan
Bastoni, Andrea
Caccamo, Marco
IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (04) : 1034 - 1047
[29] BAM Learning of Nonlinearly Separable Tasks by Using an Asymmetrical Output Function and Reinforcement Learning
Chartier, Sylvain
Boukadoum, Mounir
Amiri, Mahmood
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (08): : 1281 - 1292
[30] Using reinforcement learning to solve for optimal electricity generation investment under uncertainty
Grobman, JH
Carey, JM
STRUCTURE OF THE ENERGY INDUSTRIES: THE ONLY CONSTANT IS CHANGE, CONFERENCE PROCEEDINGS, 1999, : 127 - 136

← 1 2 3 4 5 →