Cloud Reasoning Model-based Exploration for Deep Reinforcement Learning

被引：9

作者：

Li Chenxi ^{[1
]}

Cao Lei ^{[1
]}

Chen Xiliang ^{[1
]}

Zhang Yongliang ^{[1
]}

Xu Zhixiong ^{[1
]}

Peng Hui ^{[1
]}

Duan Liwen ^{[2
]}

机构：

[1] PLA Univ Sci & Technol, Inst Command Informat Syst, Nanjing 210007, Jiangsu, Peoples R China

[2] Zhejiang Univ, Coll Mech Engn, Hangzhou 310027, Zhejiang, Peoples R China

来源：

JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY | 2018年 / 40卷 / 01期

基金：

中国博士后科学基金;

关键词：

Cloud reasoning; Deep reinforcement learning; Knowledge; Exploration strategy;

D O I：

10.11999/JEIT170347

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Reinforcement learning which has self-improving and online learning properties gets the policy of tasks through the interaction with environment. But the mechanism of "trial-and-error" usually leads to a large number of training episodes. Knowledge includes human experience and the cognition of environment. This paper tries to introduce the qualitative rules into the reinforcement learning, and represents these rules through the cloud reasoning model. It is used as the heuristics exploration strategy to guide the action selection. Empirical evaluation is conducted in OpenAI Gym environment called "CartPole-v2" and the result shows that using exploration strategy based on the cloud reasoning model significantly enhances the performance of the learning process.

引用

页码：244 / 248

页数：5

共 13 条

[1] Bellemare MG, 2016, ADV NEURAL INFORM PR, DOI DOI 10.3390/BS3030459
[2] Bianchi RAC, 2009, LECT NOTES ARTIF INT, V5650, P75, DOI 10.1007/978-3-642-02998-1_7
[3] Davenport T.H., 2000, WORKING KNOWLEDGE OR, DOI [10.1145/347634.348775, DOI 10.1145/347634.348775]
[4] Houthooft Rein, 2016, ADV NEURAL INFORM PR, P1109
[5] KUHLMANN G, 2004, P AAAI WORKSH SUP CO, P30
[6] Uncertainty reasoning based on cloud models in controllers
Li, D
Cheung, D
Shi, XM
Ng, V
[J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 1998, 35 (03) : 99 - 123
[7] Mnih V., 2020, PLAYING ATARI DEEP R
[8] Human-level control through deep reinforcement learning
Mnih, Volodymyr
Kavukcuoglu, Koray
Silver, David
Rusu, Andrei A.
Veness, Joel
Bellemare, Marc G.
Graves, Alex
Riedmiller, Martin
Fidjeland, Andreas K.
Ostrovski, Georg
Petersen, Stig
Beattie, Charles
Sadik, Amir
Antonoglou, Ioannis
King, Helen
Kumaran, Dharshan
Wierstra, Daan
Legg, Shane
Hassabis, Demis
[J]. NATURE, 2015, 518 (7540) : 529 - 533
[9] Osband I., 2016, ADV NEURAL INFORM PR, P4026
[10] Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems
Santos, Matilde
Martin H, Jose Antonio
Lopez, Victoria
Botella, Guillermo
[J]. KNOWLEDGE-BASED SYSTEMS, 2012, 32 : 28 - 36

← 1 2 →