Probabilistic Policy Reuse for Safe Reinforcement Learning

被引:6
|
作者
Garcia, Javier [1 ]
Fernandez, Fernando [1 ]
机构
[1] Univ Carlos III Madrid, Ave Univ,30, Leganes 28911, Spain
关键词
Reinforcement learning; case-based reasoning; software agents;
D O I
10.1145/3310090
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work introduces Policy Reuse for Safe Reinforcement Learning, an algorithm that combines Probabilistic Policy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforcement learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. The algorithm uses a continuously increasing monotonic risk function that allows for the identification of the probability to end up in failure from a given state. Such a risk function is defined in terms of how far such a state is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balance the exploitation of actual learned knowledge, the exploration of newactions, and the request of teacher advice in parts of the state space considered dangerous. Specifically, the pi-reuse exploration strategy is used. Using experiments in the helicopter hover task and a business management problem, we show that the pi-reuse exploration strategy can be used to completely avoid the visit to undesirable situations while maintaining the performance (in terms of the classical long-term accumulated reward) of the final policy achieved.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization
    Wen, Lu
    Duan, Jingliang
    Li, Shengbo Eben
    Xu, Shaobing
    Peng, Huei
    2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
  • [32] Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis
    Mitta, Rohan
    Hasanbeig, Hosein
    Wang, Jun
    Kroening, Daniel
    Kantaros, Yiannis
    Abate, Alessandro
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21412 - 21419
  • [33] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
    Horie, Naoto
    Matsui, Tohgoroh
    Moriyama, Koichi
    Mutoh, Atsuko
    Inuzuka, Nobuhiro
    ARTIFICIAL LIFE AND ROBOTICS, 2019, 24 (03) : 352 - 359
  • [34] Safe Reinforcement Learning: A Survey
    Wang X.-S.
    Wang R.-R.
    Cheng Y.-H.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (09): : 1813 - 1835
  • [35] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
    Naoto Horie
    Tohgoroh Matsui
    Koichi Moriyama
    Atsuko Mutoh
    Nobuhiro Inuzuka
    Artificial Life and Robotics, 2019, 24 : 352 - 359
  • [36] Safe Reinforcement Learning with Probabilistic Guarantees Satisfying Temporal Logic Specifications in Continuous Action Spaces
    Krasowski, Hanna
    Akella, Prithvi
    Ames, Aaron D.
    Althoff, Matthias
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 4372 - 4378
  • [37] Quantum error correction for heavy hexagonal code using deep reinforcement learning with policy reuse
    Ji, Yuxin
    Chen, Qinghui
    Wang, Rui
    Ji, Naihua
    Ma, Hongyang
    QUANTUM INFORMATION PROCESSING, 2024, 23 (07)
  • [38] Off-policy safe reinforcement learning for nonlinear discrete-time systems
    Jha, Mayank Shekhar
    Kiumarsi, Bahare
    NEUROCOMPUTING, 2025, 611
  • [39] Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning
    Ma, Haitong
    Liu, Changliu
    Li, Shengbo Eben
    Zheng, Sifa
    Chen, Jianyu
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 168, 2022, 168
  • [40] Safe Reinforcement Learning-based Driving Policy Design for Autonomous Vehicles on Highways
    Nguyen, Hung Duy
    Han, Kyoungseok
    INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2023, 21 (12) : 4098 - 4110