Introspective Q-learning and learning from demonstration

被引：2

作者：

Li, Mao ^{[1
]}

Brys, Tim ^{[2
]}

Kudenko, Daniel ^{[1
,3
]}

机构：

[1] Univ York, Comp Sci Dept, York YO10 5GH, N Yorkshire, England

[2] Vrije Univ Brussel, Artificial Intelligence Lab, Comp Sci Dept, Pleinlaan 9,3th Floor, B-1050 Brussels, Belgium

[3] JetBrains Res, St Petersburg, Russia

来源：

KNOWLEDGE ENGINEERING REVIEW | 2019年 / 34卷

关键词：

Queueing theory - Demonstrations - Domain Knowledge;

D O I：

10.1017/S0269888919000031

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One challenge faced by reinforcement learning (RL) agents is that in many environments the reward signal is sparse, leading to slow improvement of the agent's performance in early learning episodes. Potential-based reward shaping can help to resolve the aforementioned issue of sparse reward by incorporating an expert's domain knowledge into the learning through a potential function. Past work on reinforcement learning from demonstration (RLfD) directly mapped (sub-optimal) human expert demonstration to a potential function, which can speed up RL. In this paper we propose an introspective RL agent that significantly further speeds up the learning. An introspective RL agent records its state-action decisions and experience during learning in a priority queue. Good quality decisions, according to a Monte Carlo estimation, will be kept in the queue, while poorer decisions will be rejected. The queue is then used as demonstration to speed up RL via reward shaping. A human expert's demonstration can be used to initialize the priority queue before the learning process starts. Experimental validation in the 4-dimensional CartPole domain and the 27-dimensional Super Mario AI domain shows that our approach significantly outperforms non-introspective RL and state-of-the-art approaches in RLfD in both domains.

引用

页数：11

共 50 条

[1] Active deep Q-learning with demonstration
Chen, Si-An
Tangkaratt, Voot
Lin, Hsuan-Tien
Sugiyama, Masashi
[J]. MACHINE LEARNING, 2020, 109 (9-10) : 1699 - 1725
[2] Active deep Q-learning with demonstration
Si-An Chen
Voot Tangkaratt
Hsuan-Tien Lin
Masashi Sugiyama
[J]. Machine Learning, 2020, 109 : 1699 - 1725
[3] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[4] Introspective Reinforcement Learning and Learning from Demonstration Extended Abstract
Li, Mao
Brys, Tim
Kudenko, Daniel
[J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 1992 - 1994
[5] Q-LEARNING
WATKINS, CJCH
DAYAN, P
[J]. MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
[6] Learning rates for Q-Learning
Even-Dar, E
Mansour, Y
[J]. COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
[7] Learning rates for Q-learning
Even-Dar, E
Mansour, Y
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
[8] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
Wang, Yin-Hao
Li, Tzuu-Hseng S.
Lin, Chih-Jui
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
[9] A robot demonstration method based on LWR and Q-learning algorithm
Zhao, Guangzhe
Tao, Yong
Liu, Hui
Deng, Xianling
Chen, Youdong
Xiong, Hegen
Xie, Xianwu
Fang, Zengliang
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (01) : 35 - 46
[10] Deep Q-Learning from Demonstrations
Hester, Todd
Vecerik, Matej
Pietquin, Olivier
Lanctot, Marc
Schaul, Tom
Piot, Bilal
Horgan, Dan
Quan, John
Sendonaris, Andrew
Osband, Ian
Dulac-Arnold, Gabriel
Agapiou, John
Leibo, Joel Z.
Gruslys, Audrunas
[J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3223 - 3230

← 1 2 3 4 5 →