Hybrid MDP based integrated hierarchical Q-learning

被引:0
|
作者
ChunLin Chen
DaoYi Dong
Han-Xiong Li
Tzyh-Jong Tarn
机构
[1] Nanjing University,Department of Control and System Engineering, and State Key Laboratory for Novel Software Technology
[2] Zhejiang University,Institute of Cyber
[3] University of New South Wales at the Australian Defence Force Academy,Systems and Control, State Key Laboratory of Industrial Control Technology
[4] City University of Hong Kong,School of Engineering and Information Technology
[5] Washington University in St. Louis,Department of Manufacturing Engineering and Engineering Management
来源
Science China Information Sciences | 2011年 / 54卷
关键词
reinforcement learning; hierarchical Q-learning; hybrid MDP; temporal abstraction;
D O I
暂无
中图分类号
学科分类号
摘要
As a widely used reinforcement learning method, Q-learning is bedeviled by the curse of dimensionality: The computational complexity grows dramatically with the size of state-action space. To combat this difficulty, an integrated hierarchical Q-learning framework is proposed based on the hybrid Markov decision process (MDP) using temporal abstraction instead of the simple MDP. The learning process is naturally organized into multiple levels of learning, e.g., quantitative (lower) level and qualitative (upper) level, which are modeled as MDP and semi-MDP (SMDP), respectively. This hierarchical control architecture constitutes a hybrid MDP as the model of hierarchical Q-learning, which bridges the two levels of learning. The proposed hierarchical Q-learning can scale up very well and speed up learning with the upper level learning process. Hence this approach is an effective integral learning and control scheme for complex problems. Several experiments are carried out using a puzzle problem in a gridworld environment and a navigation control problem for a mobile robot. The experimental results demonstrate the effectiveness and efficiency of the proposed approach.
引用
收藏
页码:2279 / 2294
页数:15
相关论文
共 50 条
  • [31] An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis
    Doltsinis, Stefanos
    Ferreira, Pedro
    Lohse, Niels
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2014, 44 (09): : 1125 - 1138
  • [32] Controlling Sequential Hybrid Evolutionary Algorithm by Q-Learning
    Zhang, Haotian
    Sun, Jianyong
    Back, Thomas
    Zhang, Qingfu
    Xu, Zongben
    IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2023, 18 (01) : 84 - 103
  • [33] A Hybrid Fuzzy Q-Learning algorithm for robot navigation
    Gordon, Sean W.
    Reyes, Napoleon H.
    Barczak, Andre
    2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 2625 - 2631
  • [34] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [35] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [36] Intelligent preamble allocation for coexistence of mMTC/URLLC devices: A hierarchical Q-learning based approach
    Wang, Jiadai
    Xing, Chaochao
    Liu, Jiajia
    CHINA COMMUNICATIONS, 2023, 20 (08) : 44 - 53
  • [37] Hierarchical Q-Learning Based UAV Secure Communication against Multiple UAV Adaptive Eavesdroppers
    Liu, Jue
    Sha, Nan
    Yang, Weiwei
    Tu, Jia
    Yang, Lianxin
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2020, 2020
  • [38] Intelligent Preamble Allocation for Coexistence of mMTC/URLLC Devices: A Hierarchical Q-Learning Based Approach
    Jiadai Wang
    Chaochao Xing
    Jiajia Liu
    China Communications, 2023, 20 (08) : 44 - 53
  • [39] Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning
    Laura RAY
    Journal of Control Theory and Applications, 2011, 9 (03) : 440 - 450
  • [40] Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning
    Sun X.
    Mao T.
    Ray L.
    Shi D.
    Kralik J.
    Journal of Control Theory and Applications, 2011, 9 (03): : 440 - 450