Cloud Reasoning Model-based Exploration for Deep Reinforcement Learning

被引:9
|
作者
Li Chenxi [1 ]
Cao Lei [1 ]
Chen Xiliang [1 ]
Zhang Yongliang [1 ]
Xu Zhixiong [1 ]
Peng Hui [1 ]
Duan Liwen [2 ]
机构
[1] PLA Univ Sci & Technol, Inst Command Informat Syst, Nanjing 210007, Jiangsu, Peoples R China
[2] Zhejiang Univ, Coll Mech Engn, Hangzhou 310027, Zhejiang, Peoples R China
基金
中国博士后科学基金;
关键词
Cloud reasoning; Deep reinforcement learning; Knowledge; Exploration strategy;
D O I
10.11999/JEIT170347
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Reinforcement learning which has self-improving and online learning properties gets the policy of tasks through the interaction with environment. But the mechanism of "trial-and-error" usually leads to a large number of training episodes. Knowledge includes human experience and the cognition of environment. This paper tries to introduce the qualitative rules into the reinforcement learning, and represents these rules through the cloud reasoning model. It is used as the heuristics exploration strategy to guide the action selection. Empirical evaluation is conducted in OpenAI Gym environment called "CartPole-v2" and the result shows that using exploration strategy based on the cloud reasoning model significantly enhances the performance of the learning process.
引用
收藏
页码:244 / 248
页数:5
相关论文
共 13 条
  • [1] Bellemare MG, 2016, ADV NEURAL INFORM PR, DOI DOI 10.3390/BS3030459
  • [2] Bianchi RAC, 2009, LECT NOTES ARTIF INT, V5650, P75, DOI 10.1007/978-3-642-02998-1_7
  • [3] Davenport T.H., 2000, WORKING KNOWLEDGE OR, DOI [10.1145/347634.348775, DOI 10.1145/347634.348775]
  • [4] Houthooft Rein, 2016, ADV NEURAL INFORM PR, P1109
  • [5] KUHLMANN G, 2004, P AAAI WORKSH SUP CO, P30
  • [6] Uncertainty reasoning based on cloud models in controllers
    Li, D
    Cheung, D
    Shi, XM
    Ng, V
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 1998, 35 (03) : 99 - 123
  • [7] Mnih V., 2020, PLAYING ATARI DEEP R
  • [8] Human-level control through deep reinforcement learning
    Mnih, Volodymyr
    Kavukcuoglu, Koray
    Silver, David
    Rusu, Andrei A.
    Veness, Joel
    Bellemare, Marc G.
    Graves, Alex
    Riedmiller, Martin
    Fidjeland, Andreas K.
    Ostrovski, Georg
    Petersen, Stig
    Beattie, Charles
    Sadik, Amir
    Antonoglou, Ioannis
    King, Helen
    Kumaran, Dharshan
    Wierstra, Daan
    Legg, Shane
    Hassabis, Demis
    [J]. NATURE, 2015, 518 (7540) : 529 - 533
  • [9] Osband I., 2016, ADV NEURAL INFORM PR, P4026
  • [10] Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems
    Santos, Matilde
    Martin H, Jose Antonio
    Lopez, Victoria
    Botella, Guillermo
    [J]. KNOWLEDGE-BASED SYSTEMS, 2012, 32 : 28 - 36