"I'm Sorry Dave, I'm Afraid I Can't Do That" Deep Q-Learning from Forbidden Actions

被引:0
|
作者
Seurin, Mathieu [1 ]
Preux, Philippe [1 ]
Pietquin, Olivier [2 ]
机构
[1] Univ Lille, CNRS, INRIA, UMR 9189 CRIStAL, Lille, France
[2] Google Res, Brain Team, Mountain View, CA USA
关键词
Deep Reinforcement Learning; Safety; constraints; Q-Learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of Reinforcement Learning (RL) is still restricted to simulation or to enhance human-operated systems through recommendations. Real-world environments (e.g. industrial robots or power grids) are generally designed with safety constraints in mind implemented in the shape of valid actions masks or contingency controllers. For example, the range of motion and the angles of the motors of a robot can be limited to physical boundaries. Violating constraints thus results in rejected actions or entering in a safe mode driven by an external controller, making RL agents incapable of learning from their mistakes. In this paper, we propose a simple modification of a state-of-the-art deep RL algorithm (DQN), enabling learning from forbidden actions. To do so, the standard Q-learning update is enhanced with an extra safety loss inspired by structured classification. We empirically show that it reduces the number of hit constraints during the learning phase and accelerates convergence to near-optimal policies compared to using standard DQN. Experiments are done on a Visual Grid World Environment and the TextWorld domain.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] I'm sorry Dave, I'm afraid I can't do that fast enough
    Edge, Sam
    [J]. NEW SCIENTIST, 2020, 245 (3278) : 25 - 25
  • [2] I'm Sorry, Dave, I'm Afraid I Can't Do That: Chatbot Perception and Expectations
    Zamora, Jennifer
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON HUMAN AGENT INTERACTION (HAI'17), 2017, : 253 - 260
  • [3] I'm Sorry, Dave: I'm Afraid I Won't Do That: Social Aspects of Human-Agent Conflict
    Takayama, Leila
    Groom, Victoria
    Nass, Clifford
    [J]. CHI2009: PROCEEDINGS OF THE 27TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, VOLS 1-4, 2009, : 2099 - 2107
  • [5] I'm sorry AI, I'm afraid you can't be an author (for now)
    Vehar, France
    Gils, Thomas
    [J]. JOURNAL OF INTELLECTUAL PROPERTY LAW & PRACTICE, 2020, 15 (09) : 718 - 726
  • [6] I'm sorry, Dave ...
    Hendler, J
    [J]. IEEE INTELLIGENT SYSTEMS, 2005, 20 (06) : 2 - 4
  • [7] I'm Sorry Dave
    Choi, Charles Q.
    [J]. SCIENTIFIC AMERICAN, 2018, 318 (03) : 20 - 20
  • [8] "I'm Afraid I Can't Do That, Dave"; Getting to Know Your Buddies in a Human-Agent Team
    Schadd, Maarten P. D.
    Schoonderwoerd, Tjeerd A. J.
    van den Bosch, Karel
    Visker, Olaf H.
    Haije, Tjalling
    Veltman, Kim H. J.
    [J]. SYSTEMS, 2022, 10 (01):
  • [9] 'I'm Sorry, I Can't Help You'
    Hall, JC
    [J]. NORTH AMERICAN REVIEW, 2000, 285 (05): : 38 - 42
  • [10] 'I'm sorry I didn't'
    Herbst, J
    [J]. QUEENS QUARTERLY, 1996, 103 (04) : 836 - 836