Probabilistic Policy Reuse for Safe Reinforcement Learning

被引:6
|
作者
Garcia, Javier [1 ]
Fernandez, Fernando [1 ]
机构
[1] Univ Carlos III Madrid, Ave Univ,30, Leganes 28911, Spain
关键词
Reinforcement learning; case-based reasoning; software agents;
D O I
10.1145/3310090
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work introduces Policy Reuse for Safe Reinforcement Learning, an algorithm that combines Probabilistic Policy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforcement learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. The algorithm uses a continuously increasing monotonic risk function that allows for the identification of the probability to end up in failure from a given state. Such a risk function is defined in terms of how far such a state is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balance the exploitation of actual learned knowledge, the exploration of newactions, and the request of teacher advice in parts of the state space considered dangerous. Specifically, the pi-reuse exploration strategy is used. Using experiments in the helicopter hover task and a business management problem, we show that the pi-reuse exploration strategy can be used to completely avoid the visit to undesirable situations while maintaining the performance (in terms of the classical long-term accumulated reward) of the final policy achieved.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] Safe Reinforcement Learning-based Driving Policy Design for Autonomous Vehicles on Highways
    Hung Duy Nguyen
    Kyoungseok Han
    International Journal of Control, Automation and Systems, 2023, 21 : 4098 - 4110
  • [42] Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning
    Lu, Songtao
    Zhang, Kaiqing
    Chen, Tianyi
    Basar, Tamer
    Horesh, Lior
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8767 - 8775
  • [43] What Is Acceptably Safe for Reinforcement Learning?
    Bragg, John
    Habli, Ibrahim
    COMPUTER SAFETY, RELIABILITY, AND SECURITY, SAFECOMP 2018, 2018, 11094 : 418 - 430
  • [44] A comprehensive survey on safe reinforcement learning
    García, Javier
    Fernández, Fernando
    Journal of Machine Learning Research, 2015, 16 : 1437 - 1480
  • [45] Safe Reinforcement Learning for Sepsis Treatment
    Jia, Yan
    Burden, John
    Lawton, Tom
    Habli, Ibrahim
    2020 8TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2020), 2020, : 108 - 114
  • [46] Safe Reinforcement Learning With Dual Robustness
    Li, Zeyang
    Hu, Chuxiong
    Wang, Yunan
    Yang, Yujie
    Li, Shengbo Eben
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 10876 - 10890
  • [47] Lyapunov design for safe reinforcement learning
    Perkins, TJ
    Barto, AG
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 803 - 832
  • [48] Safe reinforcement learning for dynamical games
    Yang, Yongliang
    Vamvoudakis, Kyriakos G.
    Modares, Hamidreza
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2020, 30 (09) : 3706 - 3726
  • [49] Safe Reinforcement Learning via Shielding
    Alshiekh, Mohammed
    Bloem, Roderick
    Ehlers, Ruediger
    Koenighofer, Bettina
    Niekum, Scott
    Topcu, Ufuk
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2669 - 2678
  • [50] Safe Reinforcement Learning for Legged Locomotion
    Yang, Tsung-Yen
    Zhang, Tingnan
    Luu, Linda
    Ha, Sehoon
    Tan, Jie
    Yu, Wenhao
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 2454 - 2461