Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states

被引:4
|
作者
Aydin, Huseyin [1 ]
Cilden, Erkin [2 ]
Polat, Faruk [1 ]
机构
[1] Middle East Tech Univ, Dept Comp Engn, Ankara, Turkey
[2] STM Def Technol Engn & Trade Inc, Ankara, Turkey
关键词
Reinforcement learning; Task decomposition; Chains of bottleneck transitions;
D O I
10.1016/j.future.2022.03.016
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Reinforcement learning is known to underperform in large and ambiguous problem domains under partial observability. In such cases, a proper decomposition of the task can improve and accelerate the learning process. Even ambiguous and complex problems that are not solvable by conventional methods turn out to be easier to handle by using a convenient problem decomposition, followed by the incorporation of machine learning methods for the sub-problems. Like in most real-life problems, the decomposition of a task usually stems from the sequence of sub-tasks that must be achieved in order to get the main task done. In this study, assuming that unambiguous states are provided in advance, a decomposition of the problem is constructed by the agent based on a set of chains of bottleneck transitions, which are sequences of unambiguous and critical transitions leading to the goal state. At the higher level, an agent trains its sub-agents to extract sub-policies corresponding to the sub-tasks, namely two successive transitions in any chain, and learns the value of each subpolicy at the abstract level. Experimental study demonstrates that an early decomposition based on useful bottleneck transitions eliminates the necessity for excessive memory and improves the learning performance of the agent. It is also shown that knowing the correct order of bottleneck transitions in the decomposition results in faster construction of the solution. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:153 / 168
页数:16
相关论文
共 50 条
  • [31] Image quality assessment for machine learning tasks using meta-reinforcement learning
    Saeed S.U.
    Fu Y.
    Stavrinides V.
    Baum Z.M.C.
    Yang Q.
    Rusu M.
    Fan R.E.
    Sonn G.A.
    Noble J.A.
    Barratt D.C.
    Hu Y.
    Medical Image Analysis, 2022, 78
  • [32] Reinforcement learning using continuous states and interactive feedback
    Ayala, Angel
    Henriquez, Claudio
    Cruz, Francisco
    PROCEEDINGS OF 2ND INTERNATIONAL CONFERENCE ON APPLICATIONS OF INTELLIGENT SYSTEMS (APPIS 2019), 2019,
  • [33] Applying the Deep Learning Techniques to Solve Classification Tasks Using Gene Expression Data
    Babichev, Sergii
    Liakh, Igor
    Kalinina, Irina
    IEEE ACCESS, 2024, 12 : 28437 - 28448
  • [34] Using Memory-Based Learning to Solve Tasks with State-Action Constraints
    Verghese, Mrinal
    Atkeson, Christopher
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 9558 - 9565
  • [35] Building Document Treatment Chains Using Reinforcement Learning and Intuitive Feedback
    Nicart, Esther
    Zanuttini, Bruno
    Gilbertt, Hugo
    Grilheres, Bruno
    Pracat, Frederic
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 635 - 639
  • [36] The space of models in machine learning: using Markov chains to model transitions
    Torra, Vicenc
    Taha, Mariam
    Navarro-Arribas, Guillermo
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2021, 10 (03) : 321 - 332
  • [37] The space of models in machine learning: using Markov chains to model transitions
    Vicenç Torra
    Mariam Taha
    Guillermo Navarro-Arribas
    Progress in Artificial Intelligence, 2021, 10 : 321 - 332
  • [38] Robot Arm Control Technique using Deep Reinforcement Learning based on Dueling and Bottleneck Structure
    Kim S.J.
    Kim B.W.
    Kim, Byung Wook (bwkim@changwon.ac.kr), 1906, Korean Institute of Electrical Engineers (70): : 1906 - 1913
  • [39] On The Effectiveness Of Bottleneck Information For Solving Job Shop Scheduling Problems Using Deep Reinforcement Learning
    de Puiseau, Constantin Waubert
    Zey, Lennart
    Demir, Merve
    Tercan, Hasan
    Meisen, Tobias
    PROCEEDINGS OF THE CONFERENCE ON PRODUCTION SYSTEMS AND LOGISTICS, CPSL 2023-2, 2023, : 738 - 749
  • [40] Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards
    Zuo, Guoyu
    Zhao, Qishen
    Lu, Jiahao
    Li, Jiangeng
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2020, 17 (01)