Swarm Reinforcement Learning Method Based on Hierarchical Q-Learning

被引:0
|
作者
Kuroe, Yasuaki [1 ]
Takeuchi, Kenya [1 ]
Maeda, Yutaka [1 ]
机构
[1] Kansai Univ, Fac Engn Sci, Suita, Osaka, Japan
基金
日本学术振兴会;
关键词
reinforcement learning method; partially observed Markov decision process; hierarchical Q-learning; swarm intelligence;
D O I
10.1109/SSCI50451.2021.9659877
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In last decades the reinforcement learning method has attracted a great deal of attention and many studies have been done. However, this method is basically a trial-and-error scheme and it takes much computational time to acquire optimal strategies. Furthermore, optimal strategies may not be obtained for large and complicated problems with many states. To resolve these problems we have proposed the swarm reinforcement learning method, which is developed inspired by the multi-point search optimization methods. The Swarm reinforcement learning method has been extensively studied and its effectiveness has been confirmed for several problems, especially for Markov decision processes where the agents can fully observe the states of environments. In many real-world problems, however, the agents cannot fully observe the environments and they are usually partially observable Markov decision processes (POMDPs). The purpose of this paper is to develop a swarm reinforcement learning method which can deal with POMDPs. We propose a swarm reinforcement learning method based on HQ-learning, which is a hierarchical extension of Q-learning. It is shown through experiments that the proposed method can handle POMDPs and possesses higher performance than that of the original HQ-learning.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
    Da Silva, Lucileide M. D.
    Torquato, Matheus F.
    Fernandes, Marcelo A. C.
    IEEE ACCESS, 2019, 7 : 2782 - 2798
  • [22] Concurrent Q-learning: Reinforcement learning for dynamic goals and environments
    Ollington, RB
    Vamplew, PW
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2005, 20 (10) : 1037 - 1052
  • [23] Constraints Penalized Q-learning for Safe Offline Reinforcement Learning
    Xu, Haoran
    Zhan, Xianyuan
    Zhu, Xiangyu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8753 - 8760
  • [24] Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach
    Xu, Zhi-xiong
    Cao, Lei
    Chen, Xi-liang
    Li, Chen-xi
    Zhang, Yong-liang
    Lai, Jun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09) : 2315 - 2322
  • [25] Nested Q-learning of hierarchical control structures
    Digney, BL
    ICNN - 1996 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS. 1-4, 1996, : 161 - 166
  • [26] An Enhanced Ensemble Learning Method for Sentiment Analysis based on Q-learning
    Savargiv, Mohammad
    Masoumi, Behrooz
    Keyvanpour, Mohammad Reza
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF ELECTRICAL ENGINEERING, 2024, 48 (03) : 1261 - 1277
  • [27] Enhanced Machine Learning Algorithms: Deep Learning, Reinforcement Learning, ana Q-Learning
    Park, Ji Su
    Park, Jong Hyuk
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2020, 16 (05): : 1001 - 1007
  • [28] Nested Q-learning of hierarchical control structures
    Digney, BL
    ICNN - 1996 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS. 1-4, 1996, : 1676 - 1681
  • [29] Linear quadratic optimal control method based on output feedback inverse reinforcement Q-learning
    Liu, Wen
    Fan, Jia-Lu
    Xue, Wen-Qian
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2024, 41 (08): : 1469 - 1479
  • [30] Decision-making method for vehicle longitudinal automatic driving based on reinforcement Q-learning
    Gao, Zhenhai
    Sun, Tianjun
    Xiao, Hongwei
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2019, 16 (03):