Introspective Q-learning and learning from demonstration

被引:2
|
作者
Li, Mao [1 ]
Brys, Tim [2 ]
Kudenko, Daniel [1 ,3 ]
机构
[1] Univ York, Comp Sci Dept, York YO10 5GH, N Yorkshire, England
[2] Vrije Univ Brussel, Artificial Intelligence Lab, Comp Sci Dept, Pleinlaan 9,3th Floor, B-1050 Brussels, Belgium
[3] JetBrains Res, St Petersburg, Russia
来源
关键词
Queueing theory - Demonstrations - Domain Knowledge;
D O I
10.1017/S0269888919000031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One challenge faced by reinforcement learning (RL) agents is that in many environments the reward signal is sparse, leading to slow improvement of the agent's performance in early learning episodes. Potential-based reward shaping can help to resolve the aforementioned issue of sparse reward by incorporating an expert's domain knowledge into the learning through a potential function. Past work on reinforcement learning from demonstration (RLfD) directly mapped (sub-optimal) human expert demonstration to a potential function, which can speed up RL. In this paper we propose an introspective RL agent that significantly further speeds up the learning. An introspective RL agent records its state-action decisions and experience during learning in a priority queue. Good quality decisions, according to a Monte Carlo estimation, will be kept in the queue, while poorer decisions will be rejected. The queue is then used as demonstration to speed up RL via reward shaping. A human expert's demonstration can be used to initialize the priority queue before the learning process starts. Experimental validation in the 4-dimensional CartPole domain and the 27-dimensional Super Mario AI domain shows that our approach significantly outperforms non-introspective RL and state-of-the-art approaches in RLfD in both domains.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Active deep Q-learning with demonstration
    Chen, Si-An
    Tangkaratt, Voot
    Lin, Hsuan-Tien
    Sugiyama, Masashi
    [J]. MACHINE LEARNING, 2020, 109 (9-10) : 1699 - 1725
  • [2] Active deep Q-learning with demonstration
    Si-An Chen
    Voot Tangkaratt
    Hsuan-Tien Lin
    Masashi Sugiyama
    [J]. Machine Learning, 2020, 109 : 1699 - 1725
  • [3] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [4] Introspective Reinforcement Learning and Learning from Demonstration Extended Abstract
    Li, Mao
    Brys, Tim
    Kudenko, Daniel
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 1992 - 1994
  • [5] Q-LEARNING
    WATKINS, CJCH
    DAYAN, P
    [J]. MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
  • [6] Learning rates for Q-Learning
    Even-Dar, E
    Mansour, Y
    [J]. COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
  • [7] Learning rates for Q-learning
    Even-Dar, E
    Mansour, Y
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
  • [8] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [9] A robot demonstration method based on LWR and Q-learning algorithm
    Zhao, Guangzhe
    Tao, Yong
    Liu, Hui
    Deng, Xianling
    Chen, Youdong
    Xiong, Hegen
    Xie, Xianwu
    Fang, Zengliang
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (01) : 35 - 46
  • [10] Deep Q-Learning from Demonstrations
    Hester, Todd
    Vecerik, Matej
    Pietquin, Olivier
    Lanctot, Marc
    Schaul, Tom
    Piot, Bilal
    Horgan, Dan
    Quan, John
    Sendonaris, Andrew
    Osband, Ian
    Dulac-Arnold, Gabriel
    Agapiou, John
    Leibo, Joel Z.
    Gruslys, Audrunas
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3223 - 3230